BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
582 stars 161 forks source link

C and CPP backends: use size_t for things such as sizes. #391

Closed enedil closed 2 years ago

enedil commented 2 years ago

Previously used int is not the return type of functions such as strlen, which generates compiler warnings.

enedil commented 2 years ago

I need to note that this probably didn't have any actual correctness impact, as it would require source files greater than 2**31 bytes.

andreasabel commented 2 years ago

Thanks for the PR, @enedil!

which generates compiler warnings.

What are these warnings and how do you trigger them?

I am asking because I do not see these warnings. E.g., C backend:

gcc -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration  -c Absyn.c
gcc -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration  -c Buffer.c
flex -Ptest_ -oLexer.c test.l
bison -t -ptest_ test.y -o Parser.c
test.y:9.1-12: warning: deprecated directive: ‘%pure_parser’, use ‘%define api.pure’ [-Wdeprecated]
    9 | %pure_parser
      | ^~~~~~~~~~~~
      | %define api.pure
test.y: warning: fix-its can be applied.  Rerun with option '--update'. [-Wother]
gcc -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration  -c Lexer.c
gcc -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration  -c Parser.c
gcc -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration  -c Printer.c
gcc -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration  -c Test.c
Linking Testtest...
gcc -g Absyn.o Buffer.o Lexer.o Parser.o Printer.o Test.o -o Testtest

C++ backend:

g++ -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration -c Absyn.C
g++ -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration -c Buffer.C
flex -Ptest_ -oLexer.C test.l
bison -t -ptest_ test.y -o Parser.C
test.y:9.1-12: warning: deprecated directive: ‘%pure_parser’, use ‘%define api.pure’ [-Wdeprecated]
    9 | %pure_parser
      | ^~~~~~~~~~~~
      | %define api.pure
test.y: warning: fix-its can be applied.  Rerun with option '--update'. [-Wother]
g++ -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration -c Lexer.C
bison -t -ptest_ test.y -o Parser.C
test.y:9.1-12: warning: deprecated directive: ‘%pure_parser’, use ‘%define api.pure’ [-Wdeprecated]
    9 | %pure_parser
      | ^~~~~~~~~~~~
      | %define api.pure
test.y: warning: fix-its can be applied.  Rerun with option '--update'. [-Wother]
g++ -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration -c Parser.C
g++ -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration -c Printer.C
g++ -g --ansi -W -Wall -Wno-unused-parameter -Wno-unused-function -Wno-unneeded-internal-declaration -c Test.C
Linking Testtest...
g++ -g Absyn.o Buffer.o Lexer.o Parser.o Printer.o Test.o -o Testtest

This is with

$ cpp --version
Apple clang version 11.0.0 (clang-1100.0.33.17)

but I also do not see the warnings with the GNU compiler:

$ cpp-11 --version
cpp-11 (Homebrew GCC 11.2.0) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
enedil commented 2 years ago

gcc -Wall perhaps counter-intuitively doesn't enable all warnings. For instance, there is also -Wextra which adds a bunch of additional warnings. I didn't test it on gcc however. On clang (which you seem to be really using, as on macOS gcc is usually an alias to clang), there is a flag -Weverything, and in particular, -Wshorten-64-to-32. This came up when I was compiling my project with generated headers.

andreasabel commented 2 years ago

Thanks. I tried -Weverything, and that brings tons of warnings, e.g.:

./Printer.H:15:9: warning: macro name is a reserved identifier [-Wreserved-id-macro]
#define _L_PAREN '('
        ^
./Printer.H:16:9: warning: macro name is a reserved identifier [-Wreserved-id-macro]
#define _R_PAREN ')'
        ^
./Printer.H:68:15: warning: implicit conversion changes signedness: 'int' to 'unsigned long'
      [-Wsign-conversion]
    int end = cur_ + strlen(s);
              ^~~~ ~
./Printer.H:68:20: warning: implicit conversion loses integer precision: 'unsigned long' to 'int'
      [-Wshorten-64-to-32]
    int end = cur_ + strlen(s);
        ~~~   ~~~~~^~~~~~~~~~~
./Printer.H:98:12: warning: use of old-style cast [-Wold-style-cast]
    buf_ = (char *) malloc(buf_size);
           ^        ~~~~~~~~~~~~~~~~
./Printer.H:98:28: warning: implicit conversion changes signedness: 'int' to 'size_t'
      (aka 'unsigned long') [-Wsign-conversion]
    buf_ = (char *) malloc(buf_size);
                    ~~~~~~ ^~~~~~~~
./Printer.H:103:21: warning: implicit conversion changes signedness: 'int' to 'size_t'
      (aka 'unsigned long') [-Wsign-conversion]
    memset(buf_, 0, buf_size);
    ~~~~~~          ^~~~~~~~
./Printer.H:109:18: warning: use of old-style cast [-Wold-style-cast]
    char *temp = (char *) malloc(buf_size);
                 ^        ~~~~~~~~~~~~~~~~
./Printer.H:109:34: warning: implicit conversion changes signedness: 'int' to 'size_t'
      (aka 'unsigned long') [-Wsign-conversion]
    char *temp = (char *) malloc(buf_size);
                          ~~~~~~ ^~~~~~~~
./Printer.H:164:15: warning: implicit conversion changes signedness: 'int' to 'unsigned long'
      [-Wsign-conversion]
    int end = cur_ + strlen(s);
              ^~~~ ~
./Printer.H:164:20: warning: implicit conversion loses integer precision: 'unsigned long' to 'int'
      [-Wshorten-64-to-32]
    int end = cur_ + strlen(s);
        ~~~   ~~~~~^~~~~~~~~~~
./Printer.H:194:12: warning: use of old-style cast [-Wold-style-cast]
    buf_ = (char *) malloc(buf_size);
           ^        ~~~~~~~~~~~~~~~~
./Printer.H:194:28: warning: implicit conversion changes signedness: 'int' to 'size_t'
      (aka 'unsigned long') [-Wsign-conversion]
    buf_ = (char *) malloc(buf_size);
                    ~~~~~~ ^~~~~~~~
./Printer.H:199:21: warning: implicit conversion changes signedness: 'int' to 'size_t'
      (aka 'unsigned long') [-Wsign-conversion]
    memset(buf_, 0, buf_size);
    ~~~~~~          ^~~~~~~~
./Printer.H:205:18: warning: use of old-style cast [-Wold-style-cast]
    char *temp = (char *) malloc(buf_size);
                 ^        ~~~~~~~~~~~~~~~~
./Printer.H:205:34: warning: implicit conversion changes signedness: 'int' to 'size_t'
      (aka 'unsigned long') [-Wsign-conversion]
    char *temp = (char *) malloc(buf_size);
                          ~~~~~~ ^~~~~~~~

There is frequent sign-conversion, and also some shorten-64-to-32.

If these are the warnings you care about, could you add them in the Makefile generation, so that the fruits of your PR are checked and preserved?

enedil commented 2 years ago

I could add a bunch of those, but to be fair, the generated C++ code needs solid rework (as signaled by #293). Parts of the code already assume that std::string is available. Why PrintAbsyn does memory allocation by hand to maintain something which is essentially a string?

I can add two flags and fix related warnings (in some quick way) though.

What is the stance on slight change of interface? For instance, the PrintAbsyn returns a char* while it could be returning an std::string. In that case, it wouldn't be necessary to allocate and free by hand, thus removing quite a few code, which includes such horrendous interface as:

buf_size = something;
resizeBuffer();

instead of

resizeBuffer(something);
andreasabel commented 2 years ago

I tested the PR. The flex-generated Lexer.C does a lot of implicit sign-conversions, it seems:

Lexer.C:858:30: warning: implicit conversion changes signedness: 'const flex_int16_t' (aka 'const short')
      to 'unsigned int' [-Wsign-conversion]
                        yy_current_state = yy_nxt[yy_base[yy_current_state] + (unsigned int) yy_c];
                                                  ^~~~~~~~~~~~~~~~~~~~~~~~~ ~
Lexer.C:1288:44: warning: implicit conversion changes signedness: 'int' to 'unsigned long'
      [-Wsign-conversion]
                        YY_CURRENT_BUFFER_LVALUE->yy_buf_size - number_to_move - 1;
                                                              ~ ^~~~~~~~~~~~~~
...

So, the extra flag should not be added in the make-rule Lexer.o : Lexer.C .... Maybe a way to achieve this is to have another variable CC_EXTRA_WARN that is passed to the compiler in all other rules. (Dunno what is the most backwards-compatible way to do this.)

enedil commented 2 years ago

Ugh. On my grammar it didn't generate code with such warnings.

Perhaps https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html ?

We could push pragma to ignore these kind of errors when including files from flex?

andreasabel commented 2 years ago

I could add a bunch of those, but to be fair, the generated C++ code needs solid rework (as signaled by #293).

Reworking the C++ backend is most welcome. Atm, it has no active developer. (I am just fixing bugs.)

Parts of the code already assume that std::string is available. Why PrintAbsyn does memory allocation by hand to maintain something which is essentially a string?

Not sure. Maybe because the code for the printer generator is shared between the C++(STL) and C++/NoSTL backends.

Maybe we could retire the NoSTL backend. Or keep it while separating it from the STL backend, which then could freely use the STL.
Since I am not a C++ programmer, I have no feeling for what is standard in the community. E.g. is there an interest to use C++ without the STL? (Cf. https://softwareengineering.stackexchange.com/questions/161059/is-it-practical-to-abandon-stl-in-c-development)

What is the stance on slight change of interface? For instance, the PrintAbsyn returns a char* while it could be returning an std::string.

I suppose std::string would be in STL only?

I don't mind a modernization of the C++ backends (or just the STL one). Would you be interested in one and willing to work on it on a larger scale?

andreasabel commented 2 years ago

Ugh. On my grammar it didn't generate code with such warnings.

There is grammars under examples and testing/regression to test.

There is also the testsuite under testing (run make there), but it is not up on CI yet, unfortunately. (Caveat: the whole testsuite takes me 1hr to run.)

Perhaps gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html ?

We could push pragma to ignore these kind of errors when including files from flex?

This is indeed better than a second variable in the Makefile!

andreasabel commented 2 years ago

Disabling warnings as follows,

#pragma GCC diagnostic ignored "-Wsign-conversion"

works for gcc and clang but not for all C++ compilers (e.g. not Visual Studio). How to implement this in a portable way is described in https://www.fluentcpp.com/2019/08/30/how-to-disable-a-warning-in-cpp/, but this looks complicated.

Thus, I tend to disable warnings in the Makefile as described in https://nelkinda.com/blog/suppress-warnings-in-gcc-and-clang/#w1223aab3b1c99. This seems to work:

Lexer.o : CCFLAGS+=-Wno-sign-conversion
Lexer.o : Lexer.C Bison.H
    ${CC} ${CCFLAGS} -c Lexer.C
enedil commented 2 years ago

I'm sorry I didn't have time to answer. Yes, you're correct that this kind of additional flags makes more sense. Does -Wno-sign-conversion work with MSVC though?

andreasabel commented 2 years ago

Does -Wno-sign-conversion work with MSVC though?

Good question. I don't know and cannot test this. In fact, I don't know if any of the -W switches works for MSVC...

enedil commented 2 years ago

According to https://godbolt.org/z/M1MTaqheE, -Wall works, but -Wno-sign-conversion doesn't.

andreasabel commented 2 years ago

Looking at https://www.fluentcpp.com/2019/08/30/how-to-disable-a-warning-in-cpp/ again, we are already in a non-portable state with the Makefile, because MSVC does not recognize -Wno-unused-parameter and the like.
So maybe it is fine to continue to produce a Makefile that does not work for MSVC, until someone complains and suggests fixes. It is relatively easy to ignore the Makefile or patch it; it is maybe less easy to get rid of pragmas put into the .l file. Thus, I think it is probably better to solve this issue in the Makefile.

enedil commented 2 years ago

I agree

andreasabel commented 2 years ago

I am doing the fixup and merge this now.

Next time, please wider testing!

andreasabel commented 2 years ago

Merged as: