goblint / cil

C Intermediate Language
https://goblint.github.io/cil/
Other
136 stars 20 forks source link

Better parser errors #46

Open vogler opened 3 years ago

vogler commented 3 years ago

In #41 a missing typedef for a type __uint128_t results in the following exception:

$ ./regtest.sh 00 07
./goblint --enable dbg.debug --enable dbg.showtemps --enable dbg.regression --sets warnstyle "legacy" --html   tests/regression/00-sanity/07-term2.c
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/mach/arm/_structs.h[443:0-0] : syntax error
Parsing errorFatal error: exception Frontc.ParseError("Parse error")
Raised at Frontc.parse_to_cabs_inner in file "src/frontc/frontc.ml", line 205, characters 6-39
Called from Frontc.parse_to_cabs in file "src/frontc/frontc.ml", line 124, characters 13-38
Called from Frontc.parse_helper in file "src/frontc/frontc.ml", line 256, characters 13-32
Called from Frontc.parse in file "src/frontc/frontc.ml" (inlined), line 264, characters 32-55
Called from Cilfacade.parse in file "src/util/cilfacade.ml", line 24, characters 2-26
Called from Stdlib__list.rev_map.rmap_f in file "list.ml", line 103, characters 22-25
Called from Maingoblint.merge_preprocessed in file "src/maingoblint.ml", line 256, characters 18-62
Called from Maingoblint.main in file "src/maingoblint.ml", line 460, characters 15-56
Called from Stdlib.at_exit.new_exit in file "stdlib.ml", line 554, characters 59-63
Called from Stdlib.do_at_exit in file "stdlib.ml" (inlined), line 560, characters 20-61
Called from Std_exit in file "std_exit.ml", line 18, characters 8-20

There are two problems:

  1. Little information about the input
  2. No information about where this happened in CIL

For 2.:

If we change it to

(* raise (ParseError("Parse error")) *)
let backtrace = Printexc.get_raw_backtrace () in
Printexc.raise_with_backtrace (ParseError("Parse error")) backtrace (* re-raise with captured inner backtrace *)

we get

$ dune exec src/main.exe /Users/voglerr/goblint/analyzer/goblint_temp_e54e31/07-term2.c
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/mach/arm/_structs.h[443:0-0] : syntax error
Parsing errorFatal error: exception Frontc.ParseError("Parse error")
Raised at Errormsg.parse_error in file "src/ocamlutil/errormsg.ml", line 328, characters 2-27
Called from Stdlib__parsing.yyparse.loop in file "parsing.ml", line 151, characters 8-44
Called from Stdlib__parsing.yyparse in file "parsing.ml", line 164, characters 4-28
Re-raised at Stdlib__parsing.yyparse in file "parsing.ml", line 183, characters 8-17
Called from Stats.repeattime.repeatf in file "src/ocamlutil/stats.ml", line 106, characters 10-15
Re-raised at Stats.repeattime.repeatf in file "src/ocamlutil/stats.ml", line 110, characters 1-8
Called from Frontc.parse_to_cabs_inner in file "src/frontc/frontc.ml", line 191, characters 15-90
Re-raised at Frontc.parse_to_cabs_inner in file "src/frontc/frontc.ml", line 207, characters 6-73
Called from Frontc.parse_to_cabs in file "src/frontc/frontc.ml", line 124, characters 13-38
Called from Frontc.parse_helper in file "src/frontc/frontc.ml", line 258, characters 13-32
Called from Frontc.parse in file "src/frontc/frontc.ml", line 266, characters 32-55
Called from Dune__exe__Main.parseOneFile in file "src/main.ml", line 60, characters 12-28
Called from Util.count_map in file "src/ocamlutil/util.ml", line 19, characters 12-15
Called from Dune__exe__Main.theMain in file "src/main.ml", line 191, characters 16-64
Called from Dune__exe__Main in file "src/main.ml", line 255, characters 4-14

which is also not helpful since it has no reference to any of CIL's parsing files src/frontc/{clexer.mll, cparser.mly} (or generated files from it).

This only concerns the stack-trace. Ideally the parser would emit a helpful error - in this case that the type was not defined.

As @michael-schwarz mentioned in the original comment below, we could


Original context:

Is it possible to let the lexer complain about what it has a problem with? 'syntax error' is not that helpful.

I don't think the lexer can complain here. How is it supposed to know that __uint128_t is supposed to be a type as opposed to a variable name?

https://github.com/goblint/cil/blob/b2c66d3bdf09e2614d3b936b75eda5c5f1ce3587/src/frontc/clexer.mll#L261-L271

https://github.com/goblint/cil/blob/b2c66d3bdf09e2614d3b936b75eda5c5f1ce3587/src/frontc/clexer.mll#L583

So, I think the Lexer will emit an IDENT token here. It is only later during parsing that the error is recognized. So what one would need are parser error messages that are easy for the user to understand which is in general hard for LR(1) parsers. I don't know how to do better than report the location of the error (plus maybe the part of the input that has not been lexed?)

One could look at Menhir which is a drop-in replacement of ocamlyacc, maybe this produces more helpful error messages in case of syntax errors?

Originally posted by @michael-schwarz in https://github.com/goblint/cil/issues/41#issuecomment-889646449