antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.11k stars 3.28k forks source link

Lexer and Parser in different maven modules #1779

Open rslemos opened 7 years ago

rslemos commented 7 years ago

For sufficiently complex lexer and/or parser, with lots of supporting code or automated testcases, it would be reasonable to have them at separate maven modules, and have the parser-module compile-depend on lexer-module .

ANTLR4 (and its maven plugin) supports this mostly, with exception of tokenVocab file, which would be written in lexer-module/target/generated-sources/antlr4, but when generating the parser would be searched for in parser-module/target/generated-sources/antlr4.

Complicating the matters a bit, a maven build can be invoked in two ways:

Maven deals with both ways very nicely, adding either ~/.m2/repository/..../lexer-module.jar or ../lexer-module/target/classes to parser-module compilation classpath.

I wonder if it would be reasonable to look for .tokens files also in the classpath. Conceptually it would mean that a lexer package could provide the full information to be used either at runtime (generated and compiled class files) and at compile-time (tokens file, to be consumed by antlr4 generating a parser).

The only change needed in antlr4 itself would be:

I have the done those changes at https://github.com/rslemos/antlr4/commit/b6980190e22362e0d40b2aee43f29bb159329806.

I would like to hear from you both about the concept and the proposed patch.

rslemos commented 7 years ago

Apart from incorporating the changes I've proposed, users should add the following to their lexer-module's pom.xml:


        <build>
                <resources>
                        <resource>
                                <directory>target/generated-sources/antlr4</directory>
                                <includes>
                                        <include>*.tokens</include>
                                </includes>
                        </resource>
                </resources>
        </build>

So that the generated .tokens file gets copied over to target/classes.

Everything else (packaging, classpath handling and so on) would be handled by maven itself.

rslemos commented 7 years ago

Personally I think a more reasonable path to store the .tokens file would be inside META-INF/antlr4.

Although changes to antlr4 code to lookup on that folder would be easy to do, actually moving the file there is a difficult task in maven (ok, not that it is really hard, but adds a lot more lines than just a single and simple <resource> element).

rslemos commented 7 years ago

If you want to try it on a pet-project, https://github.com/rslemos/pet-grammars/commit/640118eb095fa0a54d82ab719c7c3a8522649250, contains the modules l (for lexer) and g (for grammars). The build should fail without my proposed patch. [please, only build, don't test that pet-project, because some tests will fail on purpose and may confuse the reader]

rslemos commented 7 years ago

Perhaps linked to #638.