antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.1k stars 3.69k forks source link

Antlr 4.10 preview: errors in antlr/antlr4 and v/ #2481

Open kaby76 opened 2 years ago

kaby76 commented 2 years ago

I've been updating the Antlr4 tool for the Go target and decided to test out the new tool on grammars-v4/ as this is a more extensive test that the unit tests for the Antlr4 tool.

KvanTTT commented 2 years ago

Thanks for testing 4.9.4 (actually 4.10) on the entire grammars repository. I just wanted to start a discussion about that. Also, caseInsensitive option should be tested (I've already started it).

As I see they the errors are completely valid. But in V.g4 it probably should be several contains a closure with at least one alternative that can match EOF errors (see rules with eos under closure).

kaby76 commented 2 years ago

@KvanTTT I added a check to trgen to fail on caseInsensitive = true but I just forgot what you were working on. Sorry. I'll fix the check so that it doesn't crash on the value. I'll fix antlr/antlr3 and v soon when I can get past https://github.com/antlr/antlr4/pull/3486

KvanTTT commented 2 years ago

I added a check to trgen to fail on caseInsensitive = true but I just forgot what you were working on. Sorry. I'll fix the check so that it doesn't crash on the value

Yes, I've also transformed some grammar with fragments TOKEN: T O K E N -> TOKEN: 'TOKEN';, but have not completed yet.

kaby76 commented 2 years ago

The v/ grammar here is not very good. It does not parse most files in the "vlib" runtime library. So, we really should consider replacing it. (I have a fix for the antlr/antlr3 grammar.)

We could hope that github/vlang/ adds a EBNF grammar and use that as a basis here. There are various requests for an EBNF for V. But, likely it'll never be done (1).

Of course, there is a parser for V. It is hand-written code (why would it be any different for V as with every other language in our miserable profession) (2). It's noted that "[u]nlike many other languages, V is not going to be always changing" (3), but there have been 13 changes in parser.v over the last month alone (4). Scraping that would be difficult.

The VSCode extension for V (5) is a TextMate implementation, so at best it gives only the lexical structure, in the great EBNF syntax JSON at that.

I recommend redoing the grammar by scraping it from the Tree Sitter grammar for V (6 or 7) within the LSP server code. It is being maintained. v-raw-scrapped.txt However, I don't know if one would end up right back here: a grammar that isn't very good.

kaby76 commented 2 years ago

I've reworked the convertor from tree-sitter to pseudo-Antlr4 grammars. As it turns out, one can't really use the grammar.js as input for the conversion: Tree-sitter performs a partial evaluation of the .js code. The table used in the V grammar for expressions must be converted by tree-sitter itself. The converter must use the output from tree-sitter, the grammar.json file. The raw grammar for V is here: grammar.txt