Open gchallen opened 1 year ago
I have a few questions, and some observations.
I removed testing of the grammar in the desc.xml for kotlin/kotlin, so officially it doesn't work. But I just tested it, and it seems to work very slowly. For kotlin/kotlin-formal, while the desc.xml says it works for most targets, the performance is slower than kotlin/kotlin.
07/03-06:00:50 ~/issues/g4-3539/kotlin/kotlin/Generated-CSharp
$ ./bin/Debug/net7.0/Test.exe examples/Test.kt
CSharp 0 examples/Test.kt success 1.9808688
Total Time: 2.0724118
07/03-06:00:58 ~/issues/g4-3539/kotlin/kotlin/Generated-CSharp
$ pushd
~/issues/g4-3539/kotlin/kotlin-formal/Generated-CSharp ~/issues/g4-3539/kotlin/kotlin/Generated-CSharp
07/03-06:01:01 ~/issues/g4-3539/kotlin/kotlin-formal/Generated-CSharp
$ ./bin/Debug/net7.0/Test.exe examples/Test.kt
CSharp 0 examples/Test.kt success 5.6004534
Total Time: 5.6918796
07/03-06:01:09 ~/issues/g4-3539/kotlin/kotlin-formal/Generated-CSharp
$
The kotlin/kotlin grammar has a lot of ambiguities. This is where one should first focus in order to speed up the parser.
Decision Rule Invocations Time Total-k Max-k Fallback Ambiguities Errors Transitions
0 kotlinFile 4 0.22318 23 5 3 3 0 13
1 kotlinFile 0 0 0 0 0 0 0 0
2 kotlinFile 336 0.699777 523 1 187 187 0 200
3 kotlinFile 0 0 0 0 0 0 0 0
4 kotlinFile 0 0 0 0 0 0 0 0
5 kotlinFile 0 0 0 0 0 0 0 0
6 script 0 0 0 0 0 0 0 0
7 script 0 0 0 0 0 0 0 0
8 script 0 0 0 0 0 0 0 0
...
According to the generated .dot file, there's a loop containing anysemi
for one transition and epsilon for all others for decision 3 of the kotlinFile
production (314 -> 315 -> 317 -> 320 -> 318 -> 316 -> 309 -> 308 -> 310 ->311 -> 312 -> 314
). Additionally, there's a loop involving decision 2 with anysemi
(311 -> 309 -> 308 -> 310 -> 311
). This is terrible because there is a combinatorial explosion of choices between decision 2 and 3! (NB to myself: the Trash toolkit should try to find SCC's that involve the same symbol and epsilon transitions. When there are two cycles that share a common path through the graph, it should be flagged as disastrous.) The rule is written poorly.
The rule is:
kotlinFile
: NL* preamble anysemi* (topLevelObject (anysemi+ topLevelObject?)*)? EOF
;
Here we see that we are taking the closure of a closure: (anysemi+ topLevelObject?)* => (anysemi+)*
.
Changing the rule to kotlinFile : NL* preamble anysemi* (topLevelObject (anysemi+ topLevelObject)*)? EOF ;
makes the parser only very modestly faster.
The real problem involves type
:
07/03-09:12:35 ~/issues/g4-3539/kotlin/kotlin/Generated-CSharp
$ trperf examples/Test.kt | tail -n +2 | sort -k4 -n -r | column -t
Time to parse: 00:00:02.0384441
193 type 370 4.643815 3696 2434 8 0 0 1163
52 delegationSpecifier 30 1.569024 721 488 21 8 0 639
268 postfixUnaryExpression 525 1.226679 1347 79 107 22 0 500
211 simpleUserType 390 0.759468 635 25 12 0 0 118
192 type 370 0.728245 507 32 0 0 0 122
233 statement 220 0.69018 546 17 37 37 0 260
...
Here, it cannot decide which of the 4 paths to choose at decision 193.
And here we gather why this is taking so long: the max-k is 2434. In other words, it needs to read 2434 tokens in order to make a decision. This is terrible.
Looking over the rules for functionType
, parenthesizedType
, nullableType
, and typeReference
, we see why. There are two conflicting derivations for type
: type => nullableType => typeReference
and type => typeReference
. This comes directly from the official grammar for Kotlin.
It has always been my assertion that the reason why people claim that hand-written parsers are faster than generated parsers is because people are scraping grammars without including critical semantic predicates, or there are legitimate errors in the grammar in reading the parser. I cannot say which.
There is an official grammar in the kotlin-spec repo. https://github.com/Kotlin/kotlin-spec/tree/release/grammar/src/main/antlr. Unfortunately, it doesn't even parse Test.kt. But, it too contains the type
production problem.
It looks like the grammar doesn't even correspond to the parser code. Here we see in comments "type" completely different from the official grammar and the parser code. You just cannot trust people to faithfully reverse engineer a parser back to a grammar.
Thanks for the reply.
FWIW, I decided to do per-thread caching of the ANTLR shared caches. That's working nicely and balancing both the speedup obtained by warming up the parser with removing the sharing bottlenecks that we were experiencing previously.
Responding to your questions.
Overall I agree that the Kotlin grammar performance is both poor and a contributing cause of the scaling problems we're experiencing. I'd be happy to work on improvements, but in the short term I'm happy to just get our tool that uses the slow but correct-ish parser to run a bit faster.
I'm running into a similar slowdown to the one reported a few years ago here: https://github.com/antlr/antlr4/issues/2454.
Specifically, I'm using ANTLR4 for a variety of syntax analysis tasks, in a batch-processing tool that is trying to leverage a high core-count machine to improve performance. Unfortunately, when I increase the number of cores, I see that both the shared DFA cache and the prediction context cache can wind up as bottlenecks, and frequently idle N - 1 available cores, which a stack dump shows are all stuck waiting for access to shared maps.
addDFAState
is a particularly problematic spot.So at present I'm essentially passing individual copies of each of these caches to each newly-created parser, as so:
This is particularly problematic when using grammars that have poor performance, such as the official Kotlin G4 grammar, which can initially take multiple seconds to process even small inputs. Normally utilizing the caches mentioned above results in an order-of-magnitude speedup once the grammar gets warm. However, keeping all cores active requires disabling those caches, which is currently resulting in a roughly 3x slowdown at least on some inputs according to my testing. This is on a data processing pipeline that runs for days, so that 3x does end up hurting a bit.
So I'm in a pickle. If I was optimizing for primarily single-core performance I'd use the caches and live with the (infrequent) bottlenecks. But for a concurrent workload I'm stuck with either a 1 fast(er) parser or N slow(er) parsers.
I took a look at the optimized ANTLR4 fork, which apparently has lock-free maintenance of these shared structures. However, the official optimized fork is still on v4.9, and the most recently-maintained fork of the fork is only up to v4.11, and both these versions cause compatibility problems with other tools that are part of our library, specifically
checkstyle
which is expecting a different ATN serialization version.I've had a few other ideas about how to address this but I'm not sure whether they'd work properly. One would be to maintain N copies of the caches and let each thread check out one that's not in use. This would cause them to warm up more slowly, but that might be an OK tradeoff, since at least for the Kotlin grammar the performance improves quite quickly. Unfortunately this is slightly complicated by the fact that we use multiple different parsers in the project, and I assume that the
decisionToDFA
array is parser-specific. It might work, but I'd be happier if someone has tried something like this and can report that it can in fact be done.Another option would be to allow each parser to grab its own copy of the caches on startup and then let them update them atomically once they're done. This would also cause the caches to warm more slowly, but at least each thread would get the benefits of some of the prior parsing passes. The problem here is that I'm unsure how to copy either shared structure, and they don't seem to be set up to do this even in memory. (I'm not worried about saving state across restarts.)
You could also run a few test inputs during warmup to populate the prediction caches and then at some point just start using copies and discarding future updates. But this would also require being able to copy these structures.
Any help or guidance would be appreciated! Thanks in advance. ANTLR is a great project, and if anything these bottlenecks are a sign of how often we're using it throughout our codebase...