antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.19k stars 3.71k forks source link

[rexx] Generated parser isn't being tested in CI builds #2821

Closed RossPatterson closed 2 years ago

RossPatterson commented 2 years ago

In ANTLR discussion 3321, @kaby76 wrote:

I also renamed the temp-examples in the pom (which tells the tools how to run tests on the grammar) to "examples/" so there is some testing. Currently, the generated parser for the grammar is not tested in the CI build!

I made explicit changes to make those tests function correctly in the ANTLR build system in commits 88bbc3af298b4cc5abd588fc30412c12e98ebd70, and b9b9e0c4023f9f1fff66a47804b53caed7bec2e9, and they appeared to work. Obviously there's more work to do, so I need to do it.

RossPatterson commented 2 years ago

I see what's going on. The CI builds (build-pwsh) use trgen to build and test the grammars, not maven. That means that the changes I made to rexx/pom.xml don't do anything in the CI builds., because I added a plug-in that trgen doesn't simulate (maven-antrun-plugin, to create lineend-normalized copies of the files in rexx/examples regardless of the OS). Luckily, the original, unnormalized files are in rexx/examples, and trgen processes them even though rexx/pom.xml says to test the normalized rexx/temp-examples directory.

So the test cases are being run. Here is an example, from the build-pwsh (windows-latest, Java) checks for PR #2815:

Test
Run if ("pull_request" -eq "pull_request") { ... }
Grammars to be tested with Java target:
... .\rexx ...
---------- Testing grammar .\rexx ----------
Building
trgen --antlr-tool-path D:\a\grammars-v4\grammars-v4\antlr4-4.11.1-complete.jar -t Java --template-sources-directory D:\a\grammars-v4\grammars-v4\_scripts\templates\
D:/a/grammars-v4/grammars-v4/rexx/
Examples directory doesn't exist temp-examples/
Prefix to remove D:/a/grammars-v4/grammars-v4/_scripts/templates/
Rendering template file from D:/a/grammars-v4/grammars-v4/_scripts/templates/Java/tester.psm1 to Generated/tester.psm1
Rendering template file from D:/a/grammars-v4/grammars-v4/_scripts/templates/Java/test.sh to Generated/test.sh
Rendering template file from D:/a/grammars-v4/grammars-v4/_scripts/templates/Java/Program.java to Generated/Program.java
Rendering template file from D:/a/grammars-v4/grammars-v4/_scripts/templates/Java/makefile to Generated/makefile
Rendering template file from D:/a/grammars-v4/grammars-v4/_scripts/templates/Java/ErrorListener.java to Generated/ErrorListener.java
Rendering template file from D:/a/grammars-v4/grammars-v4/_scripts/templates/Java/CaseChangingCharStream.java to Generated/CaseChangingCharStream.java
Copying source file from D:/a/grammars-v4/grammars-v4/rexx/RexxParser.g4 to Generated/RexxParser.g4
Copying source file from D:/a/grammars-v4/grammars-v4/rexx/RexxLexer.g4 to Generated/RexxLexer.g4
Copying source file from D:/a/grammars-v4/grammars-v4/rexx/README.md to Generated/README.md
Copying source file from D:/a/grammars-v4/grammars-v4/rexx/pom.xml to Generated/pom.xml
Copying source file from D:/a/grammars-v4/grammars-v4/rexx/LICENSE to Generated/LICENSE
Copying source file from D:/a/grammars-v4/grammars-v4/rexx/Java/RexxParserBase.java to Generated/RexxParserBase.java
javac -cp D:\a\grammars-v4\grammars-v4\antlr4-4.11.1-complete.jar\;. RexxLexer.java RexxParser.java  Program.java ErrorListener.java
Build completed, time: 00:00:06.5113483
Testing
Test cases here: D:\a\grammars-v4\grammars-v4\rexx\examples
Test case: D:\a\grammars-v4\grammars-v4\rexx\examples\example1.txt
--- Testing file D:\a\grammars-v4\grammars-v4\rexx\examples\example1.txt ---
Time: 0.14
Parse succeeded.
Test case: D:\a\grammars-v4\grammars-v4\rexx\examples\test_bitstring.rex
--- Testing file D:\a\grammars-v4\grammars-v4\rexx\examples\test_bitstring.rex ---
Time: 0.281
Parse succeeded.
Test case: D:\a\grammars-v4\grammars-v4\rexx\examples\test_bitstring.rex.tree
Test case: D:\a\grammars-v4\grammars-v4\rexx\examples\test_concatenation.rex
--- Testing file D:\a\grammars-v4\grammars-v4\rexx\examples\test_concatenation.rex ---
Time: 0.14
Parse succeeded.
Test case: D:\a\grammars-v4\grammars-v4\rexx\examples\test_concatenation.rex.tree
Test completed, time: 00:00:02.2504520

The reason I created a Maven step to normalize the linends in the test files is that some test cases have accompanying .tree files, which are references to compare the parse tree against. The antlr4test-maven-plugin will do that automatically, based on the filenames, but it turns out that trgen does not. So the problem I needed to solve - the plugin generating platform-native linends and comparing to whatever git checkout produces - doesn't exist for trgen.

kaby76 commented 2 years ago

If the .tree files contain EOL tokens because the rexx grammar places them on the default channel, why don't we just set git attributes for these files on check out? I.e.,

diff --git a/.gitattributes b/.gitattributes
index 14858d11..f0bd59f6 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -91,6 +91,9 @@ access_log text
 unicode/graphemes/examples/*.txt eol=lf
 # T-SQL tests with trees can fail with multiline strings
 sql/tsql/examples/*.sql eol=lf
+# Rexx places EOL on default channel, so the .tree's contain EOL.
+# Make sure EOL's are consistent regardless of OS.
+rexx/examples/*.rex

 # Declare files that will always have CRLF line endings on checkout.
 # Currently the /vb6/**/*.cls and *.frm file need this, otherwise the grammar fails.

The line for calling the parser doesn't do anything with doing tree diffs. It should be fixed.

I always thought that there should be a "per-grammar" trgen template to allow one to do grammar-specific, target-specific steps, like preprocessing, for testing. Ironically, trgen itself duplicates functionality of Maven, whereas it was invented so information in the pom.xml wouldn't have to duplicated, e.g., start symbol, grammar name, test files, top-level .g4's, case folding, etc. At some point, I was planning to extend trgen so that it extracts all this information from the grammars and target source code. It does a bit of that already, but still let's the pom's have "last word".

RossPatterson commented 2 years ago

"If the .tree files contain EOL tokens because the rexx grammar places them on the default channel, why don't we just set git attributes for these files on check out?"

I wish I'd thought of that. Honestly, git is a maze of twisty little passages, and I get lost in it, a lot. I've only been using it for maybe 15 years :-)

I'll work up a PR for this.

RossPatterson commented 2 years ago

"Ironically, trgen itself duplicates functionality of Maven, whereas it was invented so information in the pom.xml wouldn't have to duplicated ..."

I have a draft of a change to trgen that looks for "unexpected" Maven plugins, and refuses to process those grammars, on the grounds that you don't know what actions such a plugin might be doing. There are more of those than I expected to find:

Would you be interested in a PR?

teverett commented 2 years ago

@RossPatterson @kaby76 is the target state here that we no longer need the maven tests or the maven test plugin?

kaby76 commented 2 years ago

@teverett We're definitely still using the Maven Test Plugin that you're maintaining! It tests stuff that trgen generated tests don't do. The problem is that trgen doesn't do everything that Maven does. And I don't think we want to do that. Trgen should just be a thin layer generated that uses the native build for the target.

teverett commented 2 years ago

@kaby76 oh ok. If the end game is retirement of the maven test plugin, that would be fine. it's had a good run :). trgen is adding huge value by performing testing that the maven plugin doesn't. Is there a potential for trgen to replace the maven plugin in the future?

kaby76 commented 2 years ago

Trgen could, but it needs a rewrite, and placed into an antlr repo. We'll work on this later maybe talk about that on Sep 25.

RossPatterson commented 2 years ago

I don't have a target state for the testing aspects of the project. My goal is to get the Rexx parser to accurately represent both the ANSI standard and IBM TSO/E dialects of the Rexx language (at least) in Java and Python (at least), with tests covering any new work on it, that are runnable on Windows and Linux, in CI builds and on desktops. I don't have an opinion on whether antlr4test-maven-plugin or trgen are the better approach, I just want grammars-v4/rexx to follow whatever approaches are the project's preferences, and to do it well.

I've inadvertantly learned quite a bit about how testing works in the project, and as a retired professional programmer of 42 years, I've got some experience I can offer to incrementally improve the already-nice situation. I'm experimenting now with making trgen verify both the .errors and .tree reference files, as antlr4test-maven-plugin does in AssertErrorsErrorListener and ScenarioExecutor. I want the .tree capability because Rexx has a couple of tests that really need it, and I'm interested in the .errors capability because it seems like it should be fully supported.

kaby76 commented 2 years ago

@RossPatterson Sounds good. We can definitely use the help.