BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
586 stars 165 forks source link

C: _POSIX_C_SOURCE needed for ANSI (C89) compliance (use of strdup in lexer) #314

Closed varqox closed 3 years ago

varqox commented 3 years ago

Encountered on version v2.8.4 (did not check others).

How to reproduce

Create file Instant.cf:

Prog. Program ::= [Stmt] ;
SAss. Stmt ::= Ident "=" Exp;
SExp. Stmt ::= Exp ;
separator Stmt ";" ;

ExpAdd.            Exp1   ::= Exp2 "+"  Exp1 ;
ExpSub.            Exp2   ::= Exp2 "-"  Exp3 ;
ExpMul.            Exp3   ::= Exp3 "*"  Exp4 ;
ExpDiv.            Exp3   ::= Exp3 "/"  Exp4 ;
ExpLit.            Exp4   ::= Integer ;
ExpVar.            Exp4   ::= Ident ;
coercions Exp 4;

Compile:

bnfc --c --makefile Instant.cf && make

Run:

./TestInstant <<< "a=0"

Output:


Parse Successful!

[Abstract Syntax]
[1]    57036 segmentation fault  ./TestInstant <<< "a=0"

Expected output:


Parse Successful!

[Abstract Syntax]
(Prog [(SAss "a" (ExpLit 0))])

[Linearized Tree]
a = 0 

What is going on?

I did some investigation and I think I know why this happens. First of all, it is printing that fails, because it tries to print Ident "a", but its value is invalid (for me it is 0x30 under GDB). This value comes from strdup(). Now, during make the compiler gave us some warnings, which look like this:

Instant.l: In function ‘Instantlex’:
Instant.l:61:18: warning: implicit declaration of function ‘strdup’; did you mean ‘strcmp’? [-Wimplicit-function-declaration]
   61 | <YYINITIAL>{LETTER}{IDENT}*        yylval._string = strdup(yytext); return _IDENT_;
      |                  ^~~~~~
      |                  strcmp
Instant.l:61:16: warning: assignment to ‘char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   61 | <YYINITIAL>{LETTER}{IDENT}*        yylval._string = strdup(yytext); return _IDENT_;
      |                ^
Lexer.c: In function ‘Instant_init_buffer’:
Lexer.c:1656:48: warning: implicit declaration of function ‘fileno’ [-Wimplicit-function-declaration]
 1656 | 
      |                                                ^     

From there we see that GCC cannot see strdup() declaration and makes an implicit one i.e. "int strdup();" instead of "char strdup();" (function arguments do not matter here, what is important is the return type). What happens during execution is that: the proper strdup() is called -- the one from libc. But C converts its return value to the one from declaration i.e. int, before using it. So instead of 64bit pointer we have now a 32bit integer made from that pointer and then we convert it back to the char pointer, but the upper 32 bits will remain zeroed, hence the invalid pointer from strdup().

So why the compiler does not see the proper declaration of strdup()? If we take a look at man 3 strdup, we can see that it is conditionally enabled:

       strdup():
           _XOPEN_SOURCE >= 500
               || /* Since glibc 2.12: */ _POSIX_C_SOURCE >= 200809L
               || /* Glibc versions <= 2.19: */ _BSD_SOURCE || _SVID_SOURCE

and the C flags from the make file do not make it enabled.

How to fix it?

So the proper solution would be to make strdup() visible. How can it be done? There are several ways:

The last one is seems most viable IMO. That is because it is not as broad as _GNU_SOURCE, also fixes the last warning (from the ones above):

Lexer.c: In function ‘Instant_init_buffer’:
Lexer.c:1656:48: warning: implicit declaration of function ‘fileno’ [-Wimplicit-function-declaration]
 1656 | 
      |                                                ^     

and POSIX is a widely approved standard.

andreasabel commented 3 years ago

Thanks for the careful analysis! I did not notice this problem as it does not show up on macOS.

I am not sure about the implications of -D_POSIX_C_SOURCE=200809L. (Does it fix the problem on all platforms? Could it pose problems on some platforms?)

A simple fix could be to use a hand-rolled implementation of strdup, in the line of https://stackoverflow.com/questions/252782/strdup-what-does-it-do-in-c .

andreasabel commented 3 years ago

I changed the fix to define _POSIX_C_SOURCE in the generated .l file in a @top{...} block rather than passing it to the C compiler in the Makefile.