Closed kaby76 closed 3 days ago
The python/python3 grammar has a JavaScript port, but it is ripping slow in comparison with the CSharp and Java ports.
06/17-06:42:17 ~/issues/g4-3539/python/python3/Generated-CSharp
$ bash run.sh ../examples/base_events.py
CSharp 0 ../examples/base_events.py success 1.0922569
Total Time: 1.1610637
06/17-06:44:41 ~/issues/g4-3539/python/python3/Generated-CSharp
$ pushd
~/issues/g4-3539/python/python3/Generated-JavaScript ~/issues/g4-3539/python/python3/Generated-CSharp
06/17-06:44:44 ~/issues/g4-3539/python/python3/Generated-JavaScript
$ bash run.sh ../examples/base_events.py
JavaScript 0 ../examples/base_events.py success 101.734
Total Time: 101.746
06/17-06:46:30 ~/issues/g4-3539/python/python3/Generated-JavaScript
The JavaScript port is typically slower than CSharp by a factor of 5, not 100x's.
For example, abb:
$ bash run.sh ../examples/robdata.sys
CSharp 0 ../examples/robdata.sys success 0.0331986
Total Time: 0.0832029
06/17-07:06:31 ~/issues/g4-3539/abb/Generated-CSharp
$ pushd
~/issues/g4-3539/abb/Generated-JavaScript ~/issues/g4-3539/abb/Generated-CSharp ~/issues/g4-3539/python/python3/Generated-CSharp
06/17-07:06:34 ~/issues/g4-3539/abb/Generated-JavaScript
$ bash run.sh ../examples/robdata.sys
JavaScript 0 ../examples/robdata.sys success 0.098
Total Time: 0.105
06/17-07:06:36 ~/issues/g4-3539/abb/Generated-JavaScript
$
The JavaScript port should not be this slow in comparison to the CSharp port. The parser traces of the closure compuations are identical. I do notice several; issues:
1) this grammar does not use optimized Antlr4 syntax for expressions, which can cause the parse trees to have large single child chains; 2) the max-k for atom-expr is huge, and there is a large error count associated with the "trailers*" reference in the rule. I wonder whether there is a missing semantic predicate.
Decision Rule Invocations Time Total-k Max-k Fallback Ambiguities Errors
...
155 atom_expr 2796 2.312377 4922 28 6 0 324
...
3) I wonder whether the JavaScript runtime handles "errors" efficiently.
Ther perf data was mislabeled. Here's what it should be. (trperf needs to be fixed.)
Decision Rule Invocations Time Total-k Max-k Fallback Ambiguities Errors Transitions
...
155 atom_expr 2796 6.195964 4922 28 6 0 0 324
What I'm thinking is that the parser reads too far ahead per trailer
in the atom_expr rule. It requires context in the parse 6 times, too, but I don't think that is the problem--most are SLL transitions, not LL. It feels like there is an anti-pattern here.
I have two suggestions:
I'm going to close the issue I opened because the python grammars are actually in an "okay state", and they are getting better with @RobEin on top of it.
Create the final output in a single grammar file rather than two files.
This cannot work given the constraints.
The grammar has to be split because the lexer has modes. Antlr does not accept a combined grammar with lexer modes. Further, the lexer needs to be in "target agnostic format" because if it were not, there would be forking of the .g4s for every target type. This is exactly how we got into a bad state to start: people would modify one target and not maintain the other targets. With target agnostic, there is one grammar for all targets.
Fetch from actual Python repo to import gram files and convert them to antlr4-g4 grammar file.
That is the plan.
There are 10 python grammars in this repo. I don't know where to even begin on what to pick and maintain because they are all terrible.
The plan to reorganize