antlr / antlr4-lab

A client/server for trying out and learning about ANTLR
MIT License
34 stars 11 forks source link

Sample Kotlin Grammar does not load properly and is unusable. #92

Open Incoherent-Code opened 1 month ago

Incoherent-Code commented 1 month ago

How to reproduce:

  1. Go to Antlr Lab Site
  2. Click on the sample dropdown, and select either entry for Kotlin. (I'm assuming one entry is for kotlin-formal, but neither work)
  3. Click on the tab labeled Parser.

You'll see that the Lexer is incorrectly placed in the parser, instead of the kotlin parser. The sample does not function in this state.

Even with the kotlin parser in the correct place, a solution is needed for importing UnicodeClasses.g4, which the kotlin lexer relies on. Otherwise, the sample will throw many implicit token errors. I usually have to manually copy the contents of unicodeClasses.g4 to the end of the lexer to use the kotlin grammar with antlr lab.

kaby76 commented 1 month ago

Someone manually changed the grammars.json file. https://github.com/antlr/grammars-v4/blob/1e08bcbcc56b8ff2cfad7508815544e141d188e9/grammars.json#L2070. It's wrong and it should have been generated by script, not hand edited. https://github.com/antlr/grammars-v4/blob/master/_scripts/mkindex.py

Incoherent-Code commented 1 month ago

Upon further inspection, this is actually a bug with mkindex.py itself. I tried running mkindex.py again and got this output, which is still wrong: grammar.json

Incoherent-Code commented 1 month ago

The problem lies with lines 113 and 114:

lexer = grammars[0] if 'Lexer' in grammars[0] else grammars[1]
parser = grammars[0] if 'Parser' in grammars[0] else grammars[1]

The Kotlin pom file defines UnicodeClasses.g4 first, then the lexer and parser are listed. This edge case means that both lexer and parser are set to grammars[1].

<includes>
   <include>UnicodeClasses.g4</include>
   <include>KotlinLexer.g4</include>
   <include>KotlinParser.g4</include>
</includes>

I also noticed that kotlin-formal/pom.xml doesn't include UnicodeClasses.g4 at all, even though KotlinLexer.g4 still imports from it.

kaby76 commented 1 month ago

Upon further inspection, this is actually a bug with mkindex.py itself. I tried running mkindex.py again and got this output, which is still wrong: grammar.json

Thanks for checking this. The error is in the pom.xml itself--it has UnicodeClasses.g4 stated as a "top-level g4". Yes, it is a "lexer grammar", but it is not a "top-level g4". A "top-level g4" is a g4 that we run the tool on. UnicodeClasses.g4 is an imported file, so the tool should not be run on this file.

(And honestly, I don't understand why we are using the pom.xml for this information, when this can all be derived by trparse/trquery, or by looking at the desc.xml. The Maven tester has been replaced by trgen because trgen figures out top-level grammars, start rules, etc. When it can't, it uses the desc.xml.)

I'll need to fix the pom.xml and reindex.

kaby76 commented 1 month ago

While the parser and lexer grammar tabs fill up with the correct .g4 data, lab.antlr.org does not work with either of the kotlin grammars. It can't because the .g4's contain "import" statements, and lab.antlr.org does not implement UI for imported grammars. The mk-index script does not weed out these grammars, but it should. https://github.com/antlr/grammars-v4/issues/4201