goblint / cil

C Intermediate Language
https://goblint.github.io/cil/
Other
143 stars 20 forks source link

Fail to build on Ubuntu 18.04 64bits #174

Open mingodad opened 1 month ago

mingodad commented 1 month ago

While trying to build I'm getting this error:

dune build
Error: goblintCil__Parser_c-yacc corresponds to an invalid module name
-> required by _build/default/src/goblintCil__.ml-gen
-> required by alias src/all
-> required by alias default in dune:1
1 shift/reduce conflict, 1 reduce/reduce conflict. 
Generating machine dependency information for CIL  
dune --version
3.16.0
opam --version
2.1.0~beta4
ocaml --version
The OCaml toplevel, version 4.11.2
mingodad commented 1 month ago

I can build this fork https://github.com/prosyslab/cil without problems.

sim642 commented 3 weeks ago

Maybe dune build --verbose reveals something?

There's nothing named Parser_c in this repository. Do you happen to have something like that uncommitted in some subdirectory while building this fork?

mingodad commented 3 weeks ago

@sim642 thank you for reply ! You are right I've created another temporary parser file and that was the problem.

I also added the grammar to https://mingodad.github.io/parsertl-playground/playground/ an Yacc/Lex compatible online editor/interpreter (select Cil C parser from Examples then click Parse to see a parse tree for the content in Input source) and in doing so wile replacing right recursion by left recursion I could get the parser to work a lot faster that the original but while trying to change the original parser several tests are failing right now.

Maybe you can spot the problem (I'm not familiar with ocaml) to see if it really improve the parser performance:

---------------------------- src/frontc/cparser.mly ----------------------------
index 0391628..d0be863 100644
@@ -406,8 +406,8 @@ file: globals               {$1}
 ;
 globals:
   /* empty */                           { [] }
-| global globals                        { $1 :: $2 }
-| SEMICOLON globals                     { $2 }
+| globals global                         { $2 :: $1 }
+| globals SEMICOLON                      { $1 }
 ;

 location:
@@ -822,10 +822,11 @@ init_expression:

 initializer_list:    /* ISO 6.7.8. Allow a trailing COMMA */
     initializer                             { [$1] }
-|   initializer COMMA initializer_list_opt  { $1 :: $3 }
+|   initializer_list COMMA initializer      { $3 :: $1 }
 ;
 initializer_list_opt:
     /* empty */                             { [] }
+|   initializer_list COMMA                  { $1 }
 |   initializer_list                        { $1 }
 ;
 initializer:
@@ -868,8 +869,8 @@ opt_expression:

 comma_expression:
            expression                        {[fst $1], snd $1}
-|               expression COMMA comma_expression { fst $1 :: fst $3, snd $1 }
-|               error COMMA comma_expression      { $3 }
+|               comma_expression COMMA expression   { fst $3 :: fst $1, snd $3 }
+|               comma_expression COMMA error        { $1 }
 ;

 comma_expression_opt:
@@ -916,21 +917,27 @@ block_attrs:
 /* statements and declarations in a block, in any order (for C99 support) */
 block_element_list:
     /* empty */                          { [] }
-|   declaration block_element_list       { DEFINITION($1) :: $2 }
-|   statement block_element_list         { $1 :: $2 }
 /*(* GCC accepts a label at the end of a block *)*/
 |   IDENT COLON                             { [ LABEL (fst $1, NOP (snd $1),
                                                     snd $1)] }
-|   pragma block_element_list            { $2 }
+|   block_element_list_oom               { $1 }
+;
+block_element_list_oom:
+    declaration                          { [DEFINITION($1)] }
+|   statement                            { [$1] }
+|   pragma                               { [] }
+|   block_element_list_oom declaration   { DEFINITION($2) :: $1 }
+|   block_element_list_oom statement     { $2 :: $1 }
+|   block_element_list_oom pragma        { $1 }
 ;

 local_labels:
    /* empty */                                       { [] }
-|  LABEL__ local_label_names SEMICOLON local_labels  { $2 @ $4 }
+|  local_labels LABEL__ local_label_names SEMICOLON  { $3 @ $1 }
 ;
 local_label_names:
    IDENT                                 { [ fst $1 ] }
-|  IDENT COMMA local_label_names         { fst $1 :: $3 }
+|  local_label_names COMMA IDENT         { fst $3 :: $1 }
 ;

 statement:
@@ -1022,7 +1029,7 @@ init_declarator_list:                       /* ISO 6.7 */

 init_declarator_attr_list:
   init_declarator_attr { [ $1 ] }
-| init_declarator_attr COMMA init_declarator_attr_list { $1 :: $3 }
+| init_declarator_attr_list COMMA init_declarator_attr  { $3 :: $1 }
 ;

 init_declarator_attr:
@@ -1145,33 +1152,43 @@ struct_decl_list: /* (* ISO 6.7.2. Except that we allow empty structs. We
                         also allow missing field names. *)
                    */
    /* empty */                           { [] }
-|  decl_spec_list                 SEMICOLON struct_decl_list
-                                         { (fst $1,
-                                            [(missingFieldDecl, None)]) :: $3 }
+|  struct_decl_list_oom                 { $1 }
+;
+struct_decl_list_oom: /* (* ISO 6.7.2. *)
+                   */
+   SEMICOLON                           { [] }
 /*(* GCC allows extra semicolons *)*/
-|                                 SEMICOLON struct_decl_list
-                                         { $2 }
-|  decl_spec_list field_decl_list SEMICOLON struct_decl_list
-                                          { (fst $1, $2)
-                                            :: $4 }
+|  struct_decl_list_oom                                SEMICOLON 
+                                         { $1 }
+|  decl_spec_list SEMICOLON 
+                                         { [(fst $1,
+                                            [(missingFieldDecl, None)])] }
+|  struct_decl_list_oom decl_spec_list SEMICOLON 
+                                         { (fst $2,
+                                            [(missingFieldDecl, None)]) :: $1 }
+|                       decl_spec_list field_decl_list SEMICOLON
+                                          { [(fst $1, $2)] }
+|  struct_decl_list_oom decl_spec_list field_decl_list SEMICOLON
+                                          { (fst $2, $3)
+                                            :: $1 }
 /*(* MSVC allows pragmas in strange places *)*/
-|  pragma struct_decl_list                { $2 }
+|                       pragma            { [] }
+|  struct_decl_list_oom pragma            { $1 }

-|  error                          SEMICOLON struct_decl_list
-                                          { $3 }
+|  struct_decl_list error                              SEMICOLON 
+                                          { $1 }
 /*(* C11 allows static_assert-declaration *)*/
 |  static_assert_declaration             {
        []
    }

-|  static_assert_declaration      SEMICOLON struct_decl_list  {
-       $3
+|  struct_decl_list_oom static_assert_declaration      SEMICOLON  {
+       $1
    }
-
 ;
 field_decl_list: /* (* ISO 6.7.2 *) */
     field_decl                           { [$1] }
-|   field_decl COMMA field_decl_list     { $1 :: $3 }
+|   field_decl_list COMMA field_decl     { $3 :: $1 }
 ;
 field_decl: /* (* ISO 6.7.2. Except that we allow unnamed fields. *) */
 |   declarator                      { ($1, None) }
@@ -1234,8 +1251,14 @@ rest_par_list:
 rest_par_list1:
     /* empty */                         { ([], false) }
 |   COMMA ELLIPSIS                      { ([], true) }
-|   COMMA parameter_decl rest_par_list1 { let (params, isva) = $3 in
-                                          ($2 :: params, isva)
+|   rest_par_list_oom                   { let (params, isva) = $1 in
+                                          (params, isva)
+                                        }
+;
+rest_par_list_oom:
+    COMMA parameter_decl                   { ([$2], true) }
+|   rest_par_list_oom COMMA parameter_decl { let (params, isva) = $1 in
+                                          ($3 :: params, isva)
                                         }
 ;

@@ -1280,23 +1303,32 @@ direct_old_proto_decl:

 old_parameter_list_ne:
 |  IDENT                                       { [fst $1] }
-|  IDENT COMMA old_parameter_list_ne           { let rest = $3 in
-                                                 (fst $1 :: rest) }
+|  old_parameter_list_ne COMMA IDENT           { let rest = $1 in
+                                                 (fst $3 :: rest) }
 ;

 old_pardef_list:
    /* empty */                            { ([], false) }
 |  decl_spec_list old_pardef SEMICOLON ELLIPSIS
                                           { ([(fst $1, $2)], true) }
-|  decl_spec_list old_pardef SEMICOLON old_pardef_list
-                                          { let rest, isva = $4 in
-                                            ((fst $1, $2) :: rest, isva)
+|  old_pardef_list_oom
+                                          { let rest, isva = $1 in
+                                            (rest, isva)
+                                          }
+;
+
+old_pardef_list_oom:
+   decl_spec_list old_pardef SEMICOLON
+                                          { ([(fst $1, $2)], true) }
+|  old_pardef_list_oom decl_spec_list old_pardef SEMICOLON
+                                          { let rest, isva = $1 in
+                                            ((fst $2, $3) :: rest, isva)
                                           }
 ;

 old_pardef:
    declarator                             { [$1] }
-|  declarator COMMA old_pardef            { $1 :: $3 }
+|  old_pardef COMMA declarator            { $3 :: $1 }
 |  error                                  { [] }
 ;

@@ -1419,7 +1451,7 @@ cvspec:
 /*** GCC attributes ***/
 attributes:
     /* empty */                { []}
-|   attribute attributes           { fst $1 :: $2 }
+|   attributes attribute           { fst $2 :: $1 }
 ;

 /* (* In some contexts we can have an inline assembly to specify the name to
@@ -1472,7 +1504,7 @@ just_attribute:
    productions to avoid some S/R conflicts */
 just_attributes:
     just_attribute                      { [$1] }
-|   just_attribute just_attributes      { $1 :: $2 }
+|   just_attributes just_attribute      { $2 :: $1 }
 ;

 /** (* PRAGMAS and ATTRIBUTES *) ***/
@@ -1631,8 +1663,8 @@ attr: assignment_attr                    { $1 }

 attr_list_ne:
 |  attr                                  { [$1] }
-|  attr COMMA attr_list_ne               { $1 :: $3 }
-|  error COMMA attr_list_ne              { $3 }
+|  attr_list_ne COMMA attr               { $3 :: $1 }
+|  attr_list_ne COMMA error              { $1 }
 ;
 attr_list:
   /* empty */                            { [] }
@@ -1649,13 +1681,13 @@ paren_attr_list:
 /*** GCC ASM instructions ***/
 asmattr:
      /* empty */                        { [] }
-|    VOLATILE  asmattr                  { ("volatile", []) :: $2 }
-|    CONST asmattr                      { ("const", []) :: $2 }
-|    INLINE asmattr                     { ("inline", []) :: $2 }
+|    asmattr VOLATILE                   { ("volatile", []) :: $1 }
+|    asmattr CONST                      { ("const", []) :: $1 }
+|    asmattr INLINE                     { ("inline", []) :: $1 }
 ;
 asmtemplate:
     one_string_constant                          { [$1] }
-|   one_string_constant asmtemplate              { $1 :: $2 }
+|   asmtemplate one_string_constant              { $2 :: $1 }
 ;
 asmoutputs:
   /* empty */           { None }
@@ -1697,7 +1729,7 @@ asmclobberlst:
 ;
 asmclobberlst_ne:
    one_string_constant                           { [$1] }
-|  one_string_constant COMMA asmclobberlst_ne    { $1 :: $3 }
+|  asmclobberlst_ne COMMA one_string_constant     { $3 :: $1 }
 ;

 %%
sim642 commented 1 week ago

Well, by flipping the order of recursion in all the list-like rules and also flipping the list cons (::) arguments, I think all the lists will be in the reverse order. E.g. all statements in a block are in reverse order, all globals (functions/variables) in a file are in reverse order, etc. This is clearly wrong, so it doesn't matter how fast the parser itself is if it leads to a wrong AST.

mingodad commented 6 days ago

Thank you for reply ! Now I understand why most of ocaml menhir/ocamlyacc grammars do it: It's a limitation of how easy/hard it handles lists (high cost to append).

sim642 commented 6 days ago

Technically it would be possible to reverse the list at the right point once everything to be appended has been prepended, but these list reversals might cancel out the parsing speedup. I haven't tried though to know for sure.