Closed macintoshpie closed 2 years ago
I ran through your example and added some debug statements to figure out whats going on.
In the nodes where start
/stop
is missing, the parse tree shows that these nodes do not contain any children, or contain s node with no children.
I'm marking this as a bug, since I definitely don't carry over the start/stop attributes properly if a grammar rule context matched on nothing.
That said, I'd recommend trying to clean up the grammar you're using from grammars-v4. Looking at that grammar definition, one thing that stood out is that it contains several rules that will actually match on no tokens (namely the comment rules). This isn't really good practice since it will generate a bunch of meaningless empty parse tree nodes (and hurt performance on the way). I suspect at some point, a different author tacked on these comment nodes, and made kind of a mess of it.
For example, this sequence of rules:
import_clause
: 'import' (IDENT '=' name | name '.*' | name '.{' import_list '}' | name ) comment
;
comment
: string_comment (annotation)?
;
string_comment
: (STRING ('+' STRING)*)?
;
Defines a grammar where the comment
and string_comment
rules will match regardless if they consume tokens or not.
If the intent was to make comments optional, this would have been more appropriate:
import_clause
: 'import' (IDENT '=' name | name '.*' | name '.{' import_list '}' | name ) comment? annotation?
;
comment
: STRING ('+' STRING)*
;
I totally might have missed something since I'm not familiar with the modelica language, but I recommend you take a closer look at the grammar. You might find that you can make some significant improvements if you fix that one up, or even write your own based on the original language spec https://www.modelica.org/documents/ModelicaSpec34.pdf (see page 265)
Great, thank you for the feedback and suggestions! I'll look into tweaking the grammar
Expected behavior
Every rule context has a
.start
and.stop
attribute which contains a tokenActual behavior
Some rules are missing one or both of these tokens
Steps to reproduce
python -m pip install .
python example.py
This will print out the start and stop token at every rule context using both parser implementations. Notice that for the python implementation there are not tokens that areNone
Thanks for creating this package! Hopefully I'm not doing something obviously wrong, I'm a total antlr and C++ beginner.