antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.03k stars 3.27k forks source link

Antlr4 jar tool generates files to an inconsistent directory #3138

Open kaby76 opened 3 years ago

kaby76 commented 3 years ago

I am trying to write a consistent, platform-independent and target-independent experience in testing a grammar. I just ran across an inconsistency that boggles my mind. To reproduce what I am doing, first clone and cd to the mysql code in grammars-v4 (on Windows, use Msys2 or Ubuntu, use Bash): git clone https://github.com/antlr/grammars-v4.git; cd grammars-v4/sql/mysql. Also, download the antlr-*-jar for both Windows and Ubuntu. Place it where ever you like, and substitute the paths in the commands below.

Then:

A consistent presentation is first and foremost what a tool should provide, but this is not done for the Antlr tool. The Antlr tool seems to ignore the -o option sometimes on Ubuntu.

--Ken

rachidlamouri commented 2 years ago

Did you ever figure out why -o doesn't work as expected on Linux? I'm getting different outputs on Windows and Ubuntu for the same command https://xkcd.com/979/

kaby76 commented 2 years ago

@rachidlamouri No, I hadn't looked at this further. I changed many of the grammars in grammars-v4 to follow a standardized format, which my tool and the Antlr Maven Plugin can both handle, avoiding the issue altogether. But, it seems it's still a problem: the tool should work consistently between platforms.

rachidlamouri commented 2 years ago

@kaby76 I found this line that is causing a different behavior on Windows. It's looking for the \ file separate instead of / on Windows. When it can't find the file separator it defaults to . which gets combined with the output directory resulting in the files going directly into the output directory.

# Windows: Places generated files directly in output/
# Ubuntu: Places generated files in output/src/
java -jar antlr-4.9.2-complete.jar src/MyLexer.g4 -o output

# Windows: Places generated files in output/src/
# Ubuntu: Source path is invalid
java -jar antlr-4.9.2-complete.jar src\\MyLexer.g4 -o output

For clarity: I would expect to be able to write one command that has the same behavior on both systems.

I haven't used Java much, let alone worked with platform dependent paths in Java, so it would take me a while to make a fix. Changing the behavior could also be a breaking change, so it might make sense to add another arg to input a file separator. That being said, I found the -Xexact-output-dir arg which solves my use cases 🤷🏼‍♂️

kaby76 commented 2 years ago

I'm not sure what the best fix would be, as the "-o" and the "-lib" options seem to do related things: -o = "specify output directory where all output is generated"; -lib = "specify location of grammars, tokens files". So, applying the descriptions logically to the generated/read .token files, what is the tool supposed to do for "a4 -o foo -lib bar *.g4"? Make copies in both directories? The options are unclear. This is why my driver generator stuffs everything into a flat directory and in the case of C# (Antlr4BuildTasks) uses the -o option to place the output in bin/Debug/. The options seem to have existed for quite some time (initial version of the Antl4 tool).

mcoblenz commented 1 year ago

Just spent an hour debugging why Antlr was putting the generated files in the wrong directory on a Windows machine. It appears to be impossible to specify the correct output directory in a cross-platform way. My build config is specified in a build.json file (for yarn), so I can't programmatically choose the right path separator. So, the path uses "/", which Antlr doesn't recognize as a path separator even though it is considered a valid path separator on Windows too. Of course, if I switch to "\", then it'll break non-Windows builds.