Terrible Golang performance

movelazar commented 2 years ago

Stackoverflow: https://stackoverflow.com/questions/72266899/golang-performance-issues

Google group: https://groups.google.com/g/antlr-discussion/c/OdhAIsy2GfI

Example code: https://github.com/movelazar/perf-repro

A simple rule such as:

1 EQ 2 OR
1 EQ 2 OR
1 EQ 2 OR
1 EQ 2 OR
1 EQ 2

takes exponentially longer to parse the more 1 EQ 2 OR clauses there are. This does not happen in python (by my testing) or CSharp, Dart, Java (by stackoverflow comment).

On my machine, # of lines vs parse time:

11: 0.5s
12: 1.2s
13: 3.2s
14: 8.1s
15: 21.9s
16: 57.5s

Given that Python doesn't face this issue I can't imagine I'm doing something terrible in my grammar.

Issue goes away if I put parens on things but that's not a real solution.

On 4.10.1, first noticed with 4.9.1.

Any help is greatly appreciated. Surprised I can't find others with this issue.

movelazar commented 2 years ago

Copying an insightful comment from Stackoverflow:

profiling your bench it appears that this package spends a great amount of time within some internals github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork:1091. Doing that, it allocates a lot, then it does it again. Somtimes it adds a node to its tree. You can easily observe that by yourself running the trace utility pkg.go.dev/runtime/trace

kaby76 commented 2 years ago

Copying an insightful comment from Stackoverflow:

profiling your bench it appears that this package spends a great amount of time within some internals github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork:1091. Doing that, it allocates a lot, then it does it again. Somtimes it adds a node to its tree. You can easily observe that by yourself running the trace utility pkg.go.dev/runtime/trace

Does anyone know how to use the ggprof utility? The instructions referenced are some of the worse I've ever seen in 40 years. It took me hours to read past the incomplete instructions, syntactically invalid code given, and improper use of standard software engineering terminology, to construct a working parser driver program that sets up a server and turns on "profiling". And even then, "top10" shows diddly squat:

(pprof) top10
Showing nodes accounting for 10ms, 100% of 10ms total
      flat  flat%   sum%        cum   cum%
      10ms   100%   100%       10ms   100%  runtime.stdcall6
         0     0%   100%       10ms   100%  runtime.findrunnable
         0     0%   100%       10ms   100%  runtime.mcall
         0     0%   100%       10ms   100%  runtime.netpoll
         0     0%   100%       10ms   100%  runtime.park_m
         0     0%   100%       10ms   100%  runtime.schedule
(pprof)

I still cannot determine if ggprof is a profiling tool or a tracing tool or both. I still cannot understand why the program actually needs to be modified at all in order to perform profiling aka "sampling" at fixed intervals and recording a stack trace.

kaby76 commented 2 years ago

I changed test to use a larger example boolean expression, and redid the run--which gives wildly different results every time--and only this one run outputted more useful results:

(pprof) top111
Showing nodes accounting for 9.50s, 87.16% of 10.90s total
Dropped 134 nodes (cum <= 0.05s)
Showing top 111 nodes out of 115
      flat  flat%   sum%        cum   cum%
     2.06s 18.90% 18.90%      2.06s 18.90%  runtime.stdcall2
     1.27s 11.65% 30.55%      6.20s 56.88%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork
     0.69s  6.33% 36.88%      0.69s  6.33%  runtime.stdcall1
     0.56s  5.14% 42.02%      2.03s 18.62%  runtime.mallocgc
     0.51s  4.68% 46.70%      0.71s  6.51%  runtime.heapBitsSetType
     0.36s  3.30% 50.00%      2.52s 23.12%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).getEpsilonTarget
     0.27s  2.48% 52.48%      0.27s  2.48%  runtime.nextFreeFast (inline)
     0.22s  2.02% 54.50%      0.27s  2.48%  runtime.mapaccess1_fast64
     0.20s  1.83% 56.33%      1.67s 15.32%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseATNConfig
     0.17s  1.56% 57.89%      6.20s 56.88%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState
     0.16s  1.47% 59.36%      0.57s  5.23%  runtime.scanobject
     0.15s  1.38% 60.73%      0.15s  1.38%  runtime.memclrNoHeapPointers
     0.15s  1.38% 62.11%      0.15s  1.38%  runtime.stdcall3
     0.14s  1.28% 63.39%      0.21s  1.93%  runtime.heapBitsForAddr (inline)
     0.14s  1.28% 64.68%      0.14s  1.28%  runtime.markBits.isMarked (inline)
     0.13s  1.19% 65.87%      0.13s  1.19%  runtime.(*itabTableType).find
     0.13s  1.19% 67.06%      0.13s  1.19%  runtime.stdcall6
     0.12s  1.10% 68.17%      1.74s 15.96%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add
     0.11s  1.01% 69.17%      0.13s  1.19%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).canDropLoopEntryEdgeInLeftRecursiveRule
     0.10s  0.92% 70.09%      0.12s  1.10%  runtime.findObject
     0.10s  0.92% 71.01%      0.26s  2.39%  runtime.greyobject
     0.09s  0.83% 71.83%      0.09s  0.83%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNState).GetTransitions
     0.08s  0.73% 72.57%      0.21s  1.93%  runtime.getitab
     0.07s  0.64% 73.21%      0.08s  0.73%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfig).GetState
     0.07s  0.64% 73.85%      0.07s  0.64%  github.com/antlr/antlr4/runtime/Go/antlr.(*BaseTransition).getSerializationType
     0.07s  0.64% 74.50%      0.52s  4.77%  github.com/antlr/antlr4/runtime/Go/antlr.(*array2DHashSet).innerAdd
     0.07s  0.64% 75.14%      0.20s  1.83%  github.com/antlr/antlr4/runtime/Go/antlr.equalATNConfigs
     0.07s  0.64% 75.78%      0.07s  0.64%  runtime._ExternalCode
     0.07s  0.64% 76.42%      2.56s 23.49%  runtime.gcDrain
     0.07s  0.64% 77.06%      0.08s  0.73%  runtime.ifaceeq
     0.07s  0.64% 77.71%      2.09s 19.17%  runtime.newobject
     0.06s  0.55% 78.26%      0.06s  0.55%  github.com/antlr/antlr4/runtime/Go/antlr.murmurFinish (inline)
     0.06s  0.55% 78.81%      0.06s  0.55%  runtime.(*lfstack).pop (inline)
     0.06s  0.55% 79.36%      0.06s  0.55%  runtime.(*sweepLocker).tryAcquire (inline)
     0.06s  0.55% 79.91%      0.21s  1.93%  runtime.assertE2I
     0.06s  0.55% 80.46%      0.96s  8.81%  runtime.findrunnable
     0.05s  0.46% 80.92%      1.62s 14.86%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseATNConfig4
     0.05s  0.46% 81.38%      0.22s  2.02%  github.com/antlr/antlr4/runtime/Go/antlr.hashATNConfig
     0.05s  0.46% 81.83%      0.08s  0.73%  runtime.gcWriteBarrier
     0.04s  0.37% 82.20%      1.80s 16.51%  runtime.lock2
     0.04s  0.37% 82.57%      0.48s  4.40%  runtime.stealWork
     0.04s  0.37% 82.94%      0.09s  0.83%  runtime.unlock2
     0.03s  0.28% 83.21%      0.25s  2.29%  github.com/antlr/antlr4/runtime/Go/antlr.(*array2DHashSet).getBuckets (inline)
     0.03s  0.28% 83.49%      0.25s  2.29%  github.com/antlr/antlr4/runtime/Go/antlr.NewArrayPredictionContext
     0.03s  0.28% 83.76%      0.91s  8.35%  github.com/antlr/antlr4/runtime/Go/antlr.merge
     0.03s  0.28% 84.04%      0.11s  1.01%  runtime.(*mheap).allocSpan
     0.03s  0.28% 84.31%      0.60s  5.50%  runtime.notewakeup
     0.03s  0.28% 84.59%      1.27s 11.65%  runtime.schedule
     0.02s  0.18% 84.77%      0.64s  5.87%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).precedenceTransition
     0.02s  0.18% 84.95%      0.27s  2.48%  github.com/antlr/antlr4/runtime/Go/antlr.SingletonBasePredictionContextCreate (inline)
     0.02s  0.18% 85.14%      0.50s  4.59%  runtime.checkTimers
     0.02s  0.18% 85.32%      0.15s  1.38%  runtime.netpoll
     0.01s 0.092% 85.41%      0.27s  2.48%  github.com/antlr/antlr4/runtime/Go/antlr.(*DoubleDict).Get (inline)
     0.01s 0.092% 85.50%      0.42s  3.85%  github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).ruleTransition
     0.01s 0.092% 85.60%      0.07s  0.64%  github.com/antlr/antlr4/runtime/Go/antlr.(*Predicate).hash
     0.01s 0.092% 85.69%      0.53s  4.86%  github.com/antlr/antlr4/runtime/Go/antlr.(*array2DHashSet).Add
     0.01s 0.092% 85.78%      0.12s  1.10%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseATNConfig5 (inline)
     0.01s 0.092% 85.87%      0.25s  2.29%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseSingletonPredictionContext
     0.01s 0.092% 85.96%      0.21s  1.93%  github.com/antlr/antlr4/runtime/Go/antlr.mergeArrays
     0.01s 0.092% 86.06%      1.91s 17.52%  runtime.(*gcWork).balance
     0.01s 0.092% 86.15%      0.26s  2.39%  runtime.(*mcache).nextFree
     0.01s 0.092% 86.24%      0.07s  0.64%  runtime.(*sweepLocked).sweep
     0.01s 0.092% 86.33%      0.07s  0.64%  runtime.gcWriteBarrierCX
     0.01s 0.092% 86.42%      0.37s  3.39%  runtime.injectglist
     0.01s 0.092% 86.51%      1.81s 16.61%  runtime.lock (inline)
     0.01s 0.092% 86.61%      0.17s  1.56%  runtime.newstack
     0.01s 0.092% 86.70%      1.93s 17.71%  runtime.preemptM
     0.01s 0.092% 86.79%      0.06s  0.55%  runtime.resettimer (inline)
     0.01s 0.092% 86.88%      0.44s  4.04%  runtime.runtimer
     0.01s 0.092% 86.97%      0.17s  1.56%  runtime.sweepone
     0.01s 0.092% 87.06%      0.16s  1.47%  runtime.sysUnused
     0.01s 0.092% 87.16%      3.12s 28.62%  runtime.systemstack
         0     0% 87.16%      0.13s  1.19%  github.com/antlr/antlr4/runtime/Go/antlr.NewBaseATNConfig1
         0     0% 87.16%      0.08s  0.73%  github.com/antlr/antlr4/runtime/Go/antlr.NewBasePredictionContext (inline)
         0     0% 87.16%      0.18s  1.65%  github.com/antlr/antlr4/runtime/Go/antlr.mergeSingletons
         0     0% 87.16%      1.85s 16.97%  runtime.(*gcControllerState).enlistWorker
         0     0% 87.16%      0.25s  2.29%  runtime.(*mcache).refill
         0     0% 87.16%      0.21s  1.93%  runtime.(*mcentral).cacheSpan
         0     0% 87.16%      0.15s  1.38%  runtime.(*mcentral).grow
         0     0% 87.16%      0.15s  1.38%  runtime.(*mheap).alloc
         0     0% 87.16%      0.11s  1.01%  runtime.(*mheap).alloc.func1
         0     0% 87.16%      0.18s  1.65%  runtime.(*pageAlloc).scavenge
         0     0% 87.16%      0.17s  1.56%  runtime.(*pageAlloc).scavengeOne
         0     0% 87.16%      0.16s  1.47%  runtime.(*pageAlloc).scavengeRangeLocked
         0     0% 87.16%      0.07s  0.64%  runtime._System
         0     0% 87.16%      0.06s  0.55%  runtime.assertE2I2
         0     0% 87.16%      0.15s  1.38%  runtime.bgscavenge
         0     0% 87.16%      0.39s  3.58%  runtime.bgscavenge.func1
         0     0% 87.16%      0.19s  1.74%  runtime.bgscavenge.func2
         0     0% 87.16%      0.17s  1.56%  runtime.bgsweep
         0     0% 87.16%      0.06s  0.55%  runtime.gcAssistAlloc.func1
         0     0% 87.16%      0.06s  0.55%  runtime.gcAssistAlloc1
         0     0% 87.16%      0.81s  7.43%  runtime.gcBgMarkWorker
         0     0% 87.16%      2.57s 23.58%  runtime.gcBgMarkWorker.func2
         0     0% 87.16%      0.06s  0.55%  runtime.gcDrainN
         0     0% 87.16%      0.16s  1.47%  runtime.gopreempt_m
         0     0% 87.16%      0.21s  1.93%  runtime.goschedImpl
         0     0% 87.16%      0.36s  3.30%  runtime.injectglist.func1 (inline)
         0     0% 87.16%      1.80s 16.51%  runtime.lockWithRank (inline)
         0     0% 87.16%      1.21s 11.10%  runtime.mcall
         0     0% 87.16%      0.15s  1.38%  runtime.morestack
         0     0% 87.16%      1.16s 10.64%  runtime.park_m
         0     0% 87.16%      0.07s  0.64%  runtime.preemptall
         0     0% 87.16%      1.92s 17.61%  runtime.preemptone
         0     0% 87.16%      0.24s  2.20%  runtime.resetspinning
         0     0% 87.16%      0.40s  3.67%  runtime.runOneTimer
         0     0% 87.16%      0.11s  1.01%  runtime.scavengeSleep
         0     0% 87.16%      1.74s 15.96%  runtime.semasleep
         0     0% 87.16%      0.62s  5.69%  runtime.semawakeup
         0     0% 87.16%      0.60s  5.50%  runtime.startm
         0     0% 87.16%      0.09s  0.83%  runtime.unlock (inline)
(pprof) Time: 15.072 s

It's not clear how one changes the sampling interval.

kaby76 commented 2 years ago

I think the problem is that the there is excessive pressure on the heap and stack. This is because the definitions for many of the Antlr data structures are actually incorrect. Most fields of structs are structs themselves, whereas they should be pointers. Go is very much like C++, and one must explicitly use pointers, e.g., this is a struct, not a pointer to a struct. Contrast that with the Cpp target, where it is explicitly a pointer. The erroneous confusion of struct vs pointer to struct manifests in another problem that I mentioned here. Yes, this is just conjecture on my part. I'll try at some point to modify the code to redo the data structures to see if that fixes the performance, but it's low priority for me.

parrt commented 2 years ago

I was just going to say that wildly different times could indicate an issue with garbage collection. I also noticed that the go target who is kind of walkie when I tried it once. @jcking is taking a look at the target at the moment I think.

kaby76 commented 2 years ago

I should note, for strings of (1 EQ 2 OR)* 1 EQ 2, the runtime climbs exponentially per each "1 EQ 2 OR" sub-string added, just as what @movelazar says. But, process memory is pretty much constant! And, it's only 10 or 20 MB! You can't get more than 15 pairs of "1 EQ 2" before it starts to hang. I attached a debugger to the process on Windows, and whenever I did a break, it would be in a lock routine in the OS, likely to keep memory allocation/GC re-entrant or to single thread the runtime.

parrt commented 2 years ago

zoiks!!! @jcking here's a good used case for testing and upgraded Go target.

kylege commented 2 years ago

Same issue here.

parrt commented 2 years ago

Yeah, somebody needs to take a serious look at rebuilding the Go target. I might take a walk at it but it's lower on my priority list. sorry!

jimidle commented 2 years ago

Did anyone look in to this more? If not, then I may do this myself. Go should be about the fastest target, assuming that it has implemented the various algorithms in the runtime as well as any other target.

kaby76 commented 2 years ago

It's likely caused by a bug in the Go Antlr runtime. Turning on debug for the runtime (and cleaning out the Go runtime where we get a core dump because atn_config printing is passed a nil pointer), we see that the edges in the dfas are not correct.

See https://github.com/kaby76/issue-3718 for a complete side-by-side exact run comparison, driver code that was generated by my trgen program--handles all targets except "Swift".

CSharp here:

SLL altSubSets=[{1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}], configs=[(10,1,[52 $]), (14,1,[52 $]), (16,1,[52 $]), (56,1,[24 52 $]), (22,1,[52 $]), (10,2,[52 $],up=8), (14,2,[52 $],up=8), (16,2,[52 $],up=8), (56,2,[24 52 $],up=8), (22,2,[52 $],up=8)],dipsIntoOuterContext, predict=0, allSubsetsConflict=True, conflictingAlts={1, 2}
EDGE 0:[(26,1,[$]), (29,1,[$]), (32,1,[$]), (35,1,[$]), (38,1,[$]), (45,1,[$]), (7,2,[$],up=1), (12,2,[$],up=1), (26,2,[$],up=8), (29,2,[$],up=8), (32,2,[$],up=8), (35,2,[$],up=8), (38,2,[$],up=8), (45,2,[$],up=8), (19,2,[$],up=2), (40,2,[$],up=3), (47,2,[$],up=7)],dipsIntoOuterContext -> -1:[(10,1,[52 $]), (14,1,[52 $]), (16,1,[52 $]), (56,1,[24 52 $]), (22,1,[52 $]), (10,2,[52 $],up=8), (14,2,[52 $],up=8), (16,2,[52 $],up=8), (56,2,[24 52 $],up=8), (22,2,[52 $],up=8)],conflictingAlts={1, 2},dipsIntoOuterContext=>1 upon OR<2>
adding new DFA state: 4:[(10,1,[52 $]), (14,1,[52 $]), (16,1,[52 $]), (56,1,[24 52 $]), (22,1,[52 $]), (10,2,[52 $],up=8), (14,2,[52 $],up=8), (16,2,[52 $],up=8), (56,2,[24 52 $],up=8), (22,2,[52 $],up=8)],conflictingAlts={1, 2},dipsIntoOuterContext=>1
DFA=
s0-OR->:s4^=>1
s0-EQ->:s1^=>1
s2-OR->:s3=>2

Go here:

SLL altSubSets=[{1, 2} {1, 2} {1, 2} {1, 2} {1, 2}], previous=[(26,1,[$]), (29,1,[$]), (32,1,[$]), (35,1,[$]), (38,1,[$]), (45,1,[$]), (7,2,[$],up=1), (12,2,[$],up=1), (26,2,[$],up=8), (29,2,[$],up=8), (32,2,[$],up=8), (35,2,[$],up=8), (38,2,[$],up=8), (45,2,[$],up=8), (19,2,[$],up=2), (40,2,[$],up=3), (47,2,[$],up=7)],dipsIntoOuterContext, configs=[(10,1,[52 $]), (14,1,[52 $]), (16,1,[52 $]), (56,1,[24 52 $]), (22,1,[52 $]), (10,2,[52 $],up=8), (14,2,[52 $],up=8), (16,2,[52 $],up=8), (56,2,[24 52 $],up=8), (22,2,[52 $],up=8)],dipsIntoOuterContext, predict=0, allSubsetsConflict=true, conflictingAlts={1, 2}
EDGE 0:[(26,1,[$]), (29,1,[$]), (32,1,[$]), (35,1,[$]), (38,1,[$]), (45,1,[$]), (7,2,[$],up=1), (12,2,[$],up=1), (26,2,[$],up=8), (29,2,[$],up=8), (32,2,[$],up=8), (35,2,[$],up=8), (38,2,[$],up=8), (45,2,[$],up=8), (19,2,[$],up=2), (40,2,[$],up=3), (47,2,[$],up=7)],dipsIntoOuterContext -> -1:[(10,1,[52 $]), (14,1,[52 $]), (16,1,[52 $]), (56,1,[24 52 $]), (22,1,[52 $]), (10,2,[52 $],up=8), (14,2,[52 $],up=8), (16,2,[52 $],up=8), (56,2,[24 52 $],up=8), (22,2,[52 $],up=8)],conflictingAlts={1, 2},dipsIntoOuterContext=>1 upon <2>
DFA=
s0-->:s1^=>1
s0-->:s1^=>1
s2-->:s3=>2

Most of the debug output has a 1 to 1 correspondence between the Go and CSharp runtimes (although, it would really really help to be consistent in debug output formatted strings to make life easier). The major diff occurs where a DFA is added in CSharp but not in Go. Note, the Go runtime does print out new DFA creation, but after a few, this is the first where it's not recorded. There are 11 DFA states created in the CSharp target, 9 DFA states created in the Go target.

(And, anticipating the argument "But Java code is the gold standard!", I also did it for that target. There are 11 new DFA states added, just as with CSharp, and the with this important add also in the Java debug output.)

Unless this bug is fixed, it's unlikely to know for sure whether this is the "performance issue". I have been programming for ~50 years. There is lttle sense in tracking down a "performance issue" when the basic behavior of the software is not working.

kaby76 commented 2 years ago

The problem is here and it goes to how DFAState objects are kept in a hash table.

Let's compare the CSharp and Go target code side by side:	CSharp	Go
AddDFAState(dfa, to)	p.addDFAState(dfa, to)
dfa.states.Get(D)	dfa.getState(hash)
dictionary.TryGetValue(key, out value)	no equality compare

Hashing is never perfect. In this case, there is a collision between two dfa states, one that exists in a table, and another which should be entered into the table. After computing the hash value of the DFAState to enter, you have to perform an equality check to verify that the state you want to enter is already in the table, i.e., has an identical hash but are equality different.

Note here in C#, "states" is a map from DFAState to DFAState. The C# runtime actually performs an equality comparison and notices the hash is identical with something that is already in the table, but the value of the new DFAState is equality compare different than the one that is already in the table. So, a new state is created.

But, in Go, it is a map from int to DFAState. Collisions in hash values are never checked because it's a mapping of hash value to DFAState not DFAState to DFAState. So, a new state is not created.

@parrt This is a serious problem.

parrt commented 2 years ago

Thanks, @kaby76 for tracking this down! yeah, I was under impression Go target was kinda screwed up.

Does anybody know Go well enough to fix this hash issue?

KvanTTT commented 2 years ago

@kaby76 nice investigation, thanks. I think we must include tests on hashing into runtime testsuite.

jcking commented 2 years ago

The Go runtime does hashing poorly and in the case where kaby76 points out it assumes hashes are unique, which is just wrong. I believe the Go runtime uses a 32-bit hash even on 64-bit platforms. The top 32 bits are simply ignored and chopped off. In general, most of the Go runtime needs to be re-written. I had partially done it, but I do not have the free time and will likely not have it for quite some time.

Additionally the hierarchy of the structs needs to have another look taken at it. There is lots of embedding by pointer. It may make more sense to dereference so that there is only a single allocation. However I am not familiar enough with Golang GC to know if that helps.

jimidle commented 2 years ago

Thanks, @kaby76 for tracking this down! yeah, I was under impression Go target was kinda screwed up.

Does anybody know Go well enough to fix this hash issue?

I am pretty sure I do :). Now, do I have the time? Hmmm - I did the C runtime for v3 because nobody else had the time... It sounds like the consensus is that the Go runtime may need either a rewrite or some serious work.

What would those with an interest in the Go runtime vote for? A new runtime, or a revamp.

I can look at the hashing anyway - though this looks more like an oversight in how go and hashing works (that's a guess - I have not looked at the problem that @kaby76 describes, but that description looks sound to me). I also agree with earlier comments that the structs and embedded structs may not be organized as well as they might - but that probably is better taken care of with a rewrite.

jimidle commented 2 years ago

The problem is here and it goes to how DFAState objects are kept in a hash table.

Let's compare the CSharp and Go target code side by side:

CSharp Go AddDFAState(dfa, to) p.addDFAState(dfa, to) dfa.states.Get(D) dfa.getState(hash) dictionary.TryGetValue(key, out value) no equality compare Hashing is never perfect. In this case, there is a collision between two dfa states, one that exists in a table, and another which should be entered into the table. After computing the hash value of the DFAState to enter, you have to perform an equality check to verify that the state you want to enter is already in the table, i.e., has an identical hash but are equality different.

Note here in C#, "states" is a map from DFAState to DFAState. The C# runtime actually performs an equality comparison and notices the hash is identical with something that is already in the table, but the value of the new DFAState is equality compare different than the one that is already in the table. So, a new state is created.

But, in Go, it is a map from int to DFAState. Collisions in hash values are never checked because it's a mapping of hash value to DFAState not DFAState to DFAState. So, a new state is not created.

@parrt This is a serious problem.

OK - so, reminding myself how Java and C# work since I switched to go, C# Dictionary and Java Map<> allow the key to be objects. Just looking at Java any object that supplies consistent/conformant hashCode() and equals() can be a key. As stated above, we can see that because both calls have to be true, there won't be a clash in the map if hashCode() is equal for two objects but equals() is not. So, the DFAState object itself can be used as the map key.

In the go runtime, the use of only the hashCode() equivalent is guaranteed to generate hash clashes (eventually). Hence it doesn't work. I think I can come up with a better way of modeling the dfa.states for go and will probably have to look at all other collections.

Does anyone object to using an external dependency to replace Go builtin maps with something more sane for this case? The dev branch seems to have already been converted to modules, so this isn't really a big deal in my book.

jimidle commented 2 years ago

One question here is whether the Go runtime is supposed to allow invocations from multiple go routines at the same time? In its current form, without too much inspection, it would seem that it isn't, even if it was supposed to be a design goal.

kaby76 commented 2 years ago

The Dictionary<> implementation for C# is here. I think it's actually being used more like a HashSet because the state number is not used in the hash computation, but that's how it's coded in Java.

I made a crude, freshman-oriented implementation of a hash table for *DFAState using hash buckets, replaced the definition for "states", and tried it out. It could all be wrong, but took me a couple of hours of fumbling around with Go--which I cannot stand because ':=' assignments don't tell me what the type of a variable is, and the forced "one true brace style" style. The hack fixes the problem in collisions, invalid configs/DFAStates--and the performance issue. The runtimes are on par with the other runtimes. The hack is here. P.S.: someone updated the serialization/deserialization of some tables, and didn't update the tests--"go test" does not work.

parrt commented 2 years ago

Hi @jimidle !! Yep, fork then cut new branch for a PR from dev branch (not master). Maybe we start by using @kaby76 awesome fix.

Yep, multiple threads should be able to use the DFAState stuff but likely it ain't right. Very tricky.

jimidle commented 2 years ago

Right Ter. I have a fix. However, it is literally designed to fix this before I look to a revamp. I’m not convinced it needs a complete restart yet. This doesn’t fix hashing or anything else. Just to prove that this is a fix. After that, I will start to analyze the code.

I would be surprised if GC was an issue these days. With a large grammar and a set of large programs, how many DFAstates are we talking around. 1000s? 100,000s, 1,000,000?

Jim

On Thu, Aug 11, 2022 at 09:13 Terence Parr @.***> wrote:

Hi @jimidle https://github.com/jimidle !! Yep, fork then cut new branch for a PR from dev branch (not master). Maybe we start by using @kaby76 https://github.com/kaby76 awesome fix.

Yep, multiple threads should be able to use the DFAState stuff but likely it ain't right. Very tricky.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3718#issuecomment-1211456051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDXHZWW35NQQCBAHOLVYRHSFANCNFSM5WGGAXKQ . You are receiving this because you were mentioned.Message ID: @.***>

jimidle commented 2 years ago

Hi @jimidle !! Yep, fork then cut new branch for a PR from dev branch (not master). Maybe we start by using @kaby76 awesome fix.

Yep, multiple threads should be able to use the DFAState stuff but likely it ain't right. Very tricky.

Up to you which fix to take mate. I will make the PR anyway so you guys can take a look - I don't mind which path is taken.

My fix is very simple and just uses a look aside slice in the hash bucket. It is a very safe change and does not involve new data structures., outside go builtins. Hash collisions are fairly rare - a tiny look aside table won't add much overhead - probably not measurable. Anyway, I will test against the original grammar mentioned above.

The fix is in the same place as @kaby76 but the changes are probably smaller as I did not write a replacement table system etc - internally, go maps are a HashTable anyway, but the current code tries to do the hashing itself. I just made a few tweak changes such that the actual values are compared on any hash collision and any search. The comparison mirrors the Java implementation.

My change is only for that specific map - the whole code set probably needs a full inspection - but at the moment I don't think it is so terrible that it needs a rewrite - a lot of it is sound.

I would like to volunteer to take on a bit of a revamp of the runtime, unless anyone already working on this runtime wants to take that on before me and has the time? I don't want to step on anyone's toes of course.

I was looking at the last time I was involved with ANTLR - yikes - how did that many years go by?

jimidle commented 2 years ago

here

@kaby76 - So, in your collection implementation, are you not using the Equals() in atn_config_set.go? The reason I ask is that, my hack just adds a bucket in to the states map, and then instead of only using the hash, it looks to see if they are equal by value. I discovered that the go implementation of Equals in atn_config_set.go is quite different from the java version in that it does not actually compare the configs - I don't see how that can work. In go, we just have:

    // configs is the added elements.
    configs []ATNConfig

And the Equals() does this:

func (b *BaseATNConfigSet) Equals(other interface{}) bool {
    if b == other {
        return true
    } else if _, ok := other.(*BaseATNConfigSet); !ok {
        return false
    }

    other2 := other.(*BaseATNConfigSet)

    return b.configs != nil &&
        // TODO: b.configs.equals(other2.configs) && // TODO: Is b necessary?
        ...

So the set of configs are not actually compared. Whereas in Java, there is a custom comparator class for the declaration:

/** Track the elements as they are added to the set; supports get(i) */
    public final ArrayList<ATNConfig> configs = new ArrayList<ATNConfig>(7);

...
        @Override
        public boolean equals(ATNConfig a, ATNConfig b) {
            if ( a==b ) return true;
            if ( a==null || b==null ) return false;
            return a.state.stateNumber==b.state.stateNumber
                && a.alt==b.alt
                && a.semanticContext.equals(b.semanticContext);
        }
...

And the equals() for ATNConfigSet.java is then:

...
        boolean same = configs!=null &&
            configs.equals(other.configs) &&  // includes stack context
            this.fullCtx == other.fullCtx &&
            this.uniqueAlt == other.uniqueAlt &&
            this.conflictingAlts == other.conflictingAlts &&
            this.hasSemanticContext == other.hasSemanticContext &&
            this.dipsIntoOuterContext == other.dipsIntoOuterContext;
...

So, my simple solution to equality, will not work unless I also fix the Equals() for BaseATNConfigSet.

So, if your hack fix gets around this in some way, then I am disinclined to pursue my hack fix further - if the go runtime is to mirror the way the Java runtime works, I think that there are quite a few of these TODOs that will need addressing - and it is going to take more than either of us hacking at this issue to make it work correctly. I will pursue this more though as it will be useful to understand what the current runtime is up to!

kaby76 commented 2 years ago

So, in your collection implementation, are you not using the Equals() in atn_config_set.go?

@jimidle I think I just call the DFAState equals comparison, which is defined in DFAState. That code calls the ATNConfigSet Equals() method, which is similar to that in the C# code. But, to check, a side-by-side debug session between Java and Go should be done (at the DFAState compare methods in both targets). The code "works", but you know how assumptions can go.

Note, I looked everywhere to see if map in golang could be defined as states map[*DFAState]*DFAState instead of states map[int]*DFAState. I think I tried, and almost got there, but I didn't know how to override the equals() method that is apparently called in the map code here, and didn't try the obvious definition func (p *DFAState) equals(other *DFAState) bool. It probably can, but I'm probably the worst "go to guy" for Go--and certainly for Antlr interals. I just cannot understand Go's syntax.

jimidle commented 2 years ago

here

@kaby76 - both my local hack and your hack seem to fix the collisions, but neither seem to fix the performance problems. The provided sample repo at the start of this issue takes 71.9 seconds to run with your hack and 69.4 seconds with mine. Basically the same.

What numbers did you get on the repo above with your hack? I definitely know that I am running your version - perhaps I am missing something here?

The performance problem on the face of it is because closureCheckingStopState and it's call tree create 285,782,265 objects. and closureWork creates 211,925,408 objects. NOw, I suggest that it should not be creating that many and something else is very wrong here.

But I just wanted to check to see whether your hack fixes the performance problems locally for you?

jimidle commented 2 years ago

So, in your collection implementation, are you not using the Equals() in atn_config_set.go?

@jimidle I think I just call the DFAState equals comparison, which is defined in DFAState. That code calls the ATNConfigSet Equals() method, which is similar to that in the C# code. But, to check, a side-by-side debug session between Java and Go should be done (at the DFAState compare methods in both targets). The code "works", but you know how assumptions can go.

Note, I looked everywhere to see if map in golang could be defined as states map[*DFAState]*DFAState instead of states map[int]*DFAState. I think I tried, and almost got there, but I didn't know how to override the equals() method that is apparently called in the map code here, and didn't try the obvious definition func (p *DFAState) equals(other *DFAState) bool. It probably can, but I'm probably the worst "go to guy" for Go--and certainly for Antlr internals. I just cannot understand Go's syntax.

You cannot override the equals method for map - the key must implement in the built in comparable interface - it is low level. You either do what you did, which is make a hashtable that works for this case, or change the storage. All I did was change the map to be [int][]*DFAState and then expand the slice upon a collision where value comparison via equals shows that it is hash collision. Similarly on search, just chase the slice - because there won't be millions of clashes.

I am a big fan of go - I can see why some people are not though ;)

If you can confirm that your hack runtime also does not fix the performance issue, then I will find out why those routines are allocating so many objects. The answer might mean just starting again - maybe reusing some of the logic that seems sound. I don't quite trust the comparators (equals() etc), but I have them working.

jimidle commented 2 years ago

I am going to build the Java version of everything and debug step through side by side - that should tell us where the go runtime is going wrong. There are more bugs here than just that map ignoring hash collisions. This will take some time.

kaby76 commented 2 years ago

@jimidle You're right! This didn't fix the bug. I am both surprised and not. Seriously this is an excellent problem at so many levels.

The grammar I tested was refactored ever so slightly.

Here is the original grammar. And, here is the refactored grammar. For the refactored grammar, labeling is all removed, and rules for comparator and binary are unfolded. Please check. They should recognize the same language.

I know that unfolding does help in performance, but I've never seen a case like this because most grammars in grammars-v4 do not abstract operators from expressions. That said, the original grammar doesn't have a performance issue for the C# or Java targets.

I wrote this refactored grammar after I first tested the original, then decide to work on the hack because I wanted "simplify" the problem. Whenever I see labelling, I say "yuck". The same thing with abstracting "operators" out of expressions.

Note, I almost never use labeling because most of the time I can test children of a node directly. Besides, I almost never use Antlr visitors and listeners any more because they are way, way too primitive. Instead I use XPath2 and an XSLT engine that I am developing. See this Markup UK paper of the kind of problems I am more interested in.

The hack does fix a bug in DFAState creation, so that is important.

But, there is more to this.

jimidle commented 2 years ago

@jimidle You're right! This didn't fix the bug. I am both surprised and not. Seriously this is an excellent problem at so many levels.

The grammar I tested was refactored ever so slightly.

Here is the original grammar. And, here is the refactored grammar. For the refactored grammar, labeling is all removed, and rules for comparator and binary are unfolded. Please check. They should recognize the same language.

I know that unfolding does help in performance, but I've never seen a case like this because most grammars in grammars-v4 do not abstract operators from expressions. That said, the original grammar doesn't have a performance issue for the C# or Java targets.

I wrote this refactored grammar after I first tested the original, then decide to work on the hack because I wanted "simplify" the problem. Whenever I see labelling, I say "yuck". The same thing with abstracting "operators" out of expressions.

Note, I almost never use labeling because most of the time I can test children of a node directly. Besides, I almost never use Antlr visitors and listeners any more because they are way, way too primitive. Instead I use XPath2 and an XSLT engine that I am developing. See this Markup UK paper of the kind of problems I am more interested in.

The hack does fix a bug in DFAState creation, so that is important.

But, there is more to this.

Thanks for the confirmation that I’m not going crazy ;). I’ll pursue this tomorrow with an intent to fix it in any way possible. I am beginning to think a rewrite using generics, idiomatic go, and some dedicated container objects is what needs to happen.

kaby76 commented 2 years ago

I redid the side-by-side comparison code here to include side-by-side comparisons for both the original grammar and the refactored grammar I wrote. For the original grammar, there is a huge divergence of the "LL altSubSets" set between Go and either C# or Java for input "p5.txt", starting about halfway into the runs. There aren't any significant diffs for the refactored grammar between C# and Go. There seems to be a serious problem still in the data structures.

kaby76 commented 2 years ago

For input p5.txt, and the original grammar here in Go, here in C#, here in Java, I traced it to the fourth call here in Go, C#, and Java.

Stepping simulaneously into the call across three debuggers, we see that the inputs are the same, but the results are not.

It is likely due to a similar hash set problem over here. The corresponding code in java is here, C# here. The Go code just does not look at all correct. And, clearly if the inputs are the same but results different for Go, then it's wrong.

At this point, I would suspect each and every single use of map[] in the Antlr Go runtime to be suspect. Clearly, Dictionary<ATNConfig, BitSet> in C#, FlexibleHashMap<ATNConfig,BitSet> in Java, and map in Go have different semantics. And Go's map, you can't override equals() or hash() like the other two targets.

This is very serious.

parrt commented 2 years ago

I would be surprised if GC was an issue these days. With a large grammar and a set of large programs, how many DFAstates are we talking around. 1000s? 100,000s, 1,000,000?

I can't remember any numbers off the top of my head, but the numbers do start to become significant for large grammars and large input. For a long running server this cache can actually grow without bound. Add some interval you have to whack the cache in all targets... This has not been a major problem as far as I know

parrt commented 2 years ago

I would like to volunteer to take on a bit of a revamp of the runtime, unless anyone already working on this runtime wants to take that on before me and has the time? I don't want to step on anyone's toes of course.

No one has been in the driver seat so if you are willing, please do take it on! @jcking started but no longer has the time and I don't think he had anything ready to push.

My impression is that the runtime could really use a rebuild, which might be easier than digging through looking for problems. I will warn you that the run time is vastly more complicated than v3 because it does dynamic grammar analysis on the fly instead of statically in the tool!

As a datapoint, when I tried to use the target to play with Go, it just didn't seem to work at all. For example when I gave my parser bad input it's simply terminated with no error message.

I was looking at the last time I was involved with ANTLR - yikes - how did that many years go by?

haha. Welcome back! What's a decade or two between friends?

parrt commented 2 years ago

I am beginning to think a rewrite using generics, idiomatic go, and some dedicated container objects is what needs to happen.

Yay!

parrt commented 2 years ago

Hi @kaby76 thank you so much for that analysis. Very helpful to get side-by-side information and the infrastructure to test other target.

This is very serious.

I agree. If @jimidle is up for it, let's rebuild using the existing target as useful information but focusing on the java target as a guide for "ground truth".

KvanTTT commented 2 years ago

How is it possible to test similar issues in all runtimes? Maybe use something like this? https://github.com/antlr/antlr4/pull/3775 It's possible to use arbitrary code for any target but with the same expected output.

kaby76 commented 2 years ago

How is it possible to test similar issues in all runtimes?

One way to go is to compare debug output for a parse. You're almost there anyway.

1) Ideally, it would be good to be able to turn on debug output dynamically. For C#, a change was made to compile out for "release" builds. I wouldn't have done that. And for what, to save a few thousand bytes for strings and calls to print routines? 2) Make a concerted effort to make the debug output look the same across targets.

a) Some of the output for the C# target is terrible, like ["LL altSubSets=System.Collections.Generic.Dictionary&#96;2+ValueCollection[Antlr4.Runtime.Atn.ATNConfig,Antlr4.Runtime.Sharpen.BitSet], predict=0, ResolvesToJustOneViableAlt=1"](https://github.com/kaby76/issue-3718/blob/055806acc3297769aa7f5f7336deada7781d9d74/original-grammar/csharp/out.txt#L3517). Over in Java, it's ["LL altSubSets=[{1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}], predict=0, resolvesToJustOneViableAlt=1"](https://github.com/kaby76/issue-3718/blob/055806acc3297769aa7f5f7336deada7781d9d74/original-grammar/java/out.txt#L2389) (NB: 7 elements). In golang, it looks like this (the divergence): `"LL altSubSets=[{2} {2} {2} {2} {1} {1} {1} {2} {2} {1} {1} {1} {1, 2}], predict=0, resolvesToJustOneViableAlt=0"`, Line 3711 (NB: 13 elements).

b) Sort "sets/dictionaries" so the output can be compared.

c) You have output over in Go [here](https://github.com/antlr/antlr4/blob/master/runtime/Go/antlr/parser_atn_simulator.go#L984) where there is no equivalent in Java [here](https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L1462).

d) If code iterates over a set/dictionary, and element order diverges between targets, have an option to sort prior to iteration over the set/dictionary. Allow an option to have different orderings.

parrt commented 2 years ago

I love this idea. We just need a way to standardize the trace and it should be very easy to compare not only the decisions made but also the rules visited etc.

parrt commented 2 years ago

I think I should come up with a standard trace, which could be useful for all sorts of things including my new playground http://lab.antlr.org . The trace should include more or less what we do now with Parser listening (entry/exit from rules) but we should include match of tokens and then the decision making would be at and set of prediction results, similar to what we have now for Java. I guess first I would have to look to see what we generate now from Java to see how suitable it is.

@kaby76 @ericvergnaud in your experience, what are the key elements to verifying a new target against Java ("truth")? Is it just the lookahead computations and not the parser rule entry/exits and token matches?

Let me think about how intrusive these changes are before deciding on whether there is a cost at runtime or we do a debug build / option.

jimidle commented 2 years ago

I think I should come up with a standard trace, which could be useful for all sorts of things including my new playground http://lab.antlr.org . The trace should include more or less what we do now with Parser listening (entry/exit from rules) but we should include match of tokens and then the decision making would be at and set of prediction results, similar to what we have now for Java. I guess first I would have to look to see what we generate now from Java to see how suitable it is.

@kaby76 @ericvergnaud in your experience, what are the key elements to verifying a new target against Java ("truth")? Is it just the lookahead computations and not the parser rule entry/exits and token matches?

Let me think about how intrusive these changes are before deciding on whether there is a cost at runtime or we do a debug build / option.

I think that the debug mode has been more or less adapted by the authors of the runtimes for their own purposes - I did this myself with v3 C for instance - unless you mean the v3 type debug mode for the UI?

So, I suggest, if it seems to be useful for verifying runtime accuracy that there is a new mode for this: diagnostics or something like that.

One way to implement would be to produce a machine readable output that could be compared programmatically - if there were to be a good set of test grammars, the diagnostic output from Java could be stored in git and as runtimes implemented the diagnostic mode we could have a small script/program that could verify the outputs as part of testing. JSON or JSONL might be good enough assuming that order of operations can be coded in to that.

In Go, there is no conditional compilation, so there would be a small runtime effect in testing a flag. If this was dedicated to runtime verification, then it would be OK to compile it out for production, for those languages where this is possible.

My own experience is that which you suggest, that it is the construction of data that matters - token matching and so on seems more like it is useful for the debug mode.

Perhaps a discussion is the correct place to discuss this, rather than this issue thread?

I will also open a discussion about replacing the go runtime, assuming I have permissions to create a discussion.

jimidle commented 2 years ago

here

Yes - I agree - the original author did not think about hash conflicts. It will all have to be redone. I will take a quick stab at some hack fixes so that the current runtime is usable (hopefully), but we need a redo.

parrt commented 2 years ago

I will also open a discussion about replacing the go runtime, assuming I have permissions to create a discussion.

Created https://github.com/antlr/antlr4/discussions/3812

kaby76 commented 2 years ago

Adding notes here on debugging the current issue with the original SimpleBoolean.g4 grammar. Anything related to recommendations on fixing Go outside of just this particular bug, I'll add to the new Discussion thread.

Actually, the bug is more complicated than I thought and seems to also occur in config set computation.

The input to the 4th call to "PredictionMode.GetConflictingAltSubsets(reach.configs)" is not the same across targets. It's different for Go. I only thought it was the same because cursoring over the variables showed sets with the same size, and most of the values the same. But VSCode does not call "String()" for the Dart code. So, I missed a difference in the config set data structure.

In CSharp, input "config.reaches" is

[(62,1,[31 11 $]),
(64,1,[35 11 $]),
(38,1,[11 $]),
(41,1,[11 $]),
(44,1,[11 $]),
(51,1,[11 $]),
(11,1,[$]),
(62,2,[31 11 $]),
(64,2,[35 11 $]),
(38,2,[11 $]),
(41,2,[11 $]),
(44,2,[11 $]),
(51,2,[11 $]),
(11,2,[$])]

In Java, input is "config" (slightly different signature).

[(62,1,[31 11 $]),
(64,1,[35 11 $]),
(38,1,[11 $]),
(41,1,[11 $]),
(44,1,[11 $]),
(51,1,[11 $]),
(11,1,[$]),
(62,2,[31 11 $]),
(64,2,[35 11 $]),
(38,2,[11 $]),
(41,2,[11 $]),
(44,2,[11 $]),
(51,2,[11 $]),
(11,2,[$])]

In Go, input is

[(62,1,[31 [11 $, 32 36 11 $, 36 11 $]]),
(64,1,[35 [11 $, 32 36 11 $, 36 11 $]]),
(38,1,[[11 $, 32 36 11 $, 36 11 $]]),
(41,1,[[11 $, 32 36 11 $, 36 11 $]]),
(44,1,[[11 $, 32 36 11 $, 36 11 $]]),
(51,1,[[11 $, 32 36 11 $, 36 11 $]]),
(11,1,[$]),
(62,2,[31 [11 $, 32 11 $]]),
(64,2,[35 [11 $, 32 11 $]]),
(38,2,[[11 $, 32 11 $]]),
(41,2,[[11 $, 32 11 $]]),
(44,2,[[11 $, 32 11 $]]),
(51,2,[[11 $, 32 11 $]]),
(11,2,[$])]

jimidle commented 2 years ago

Yes it’s this function and an interaction with closure and precedence (which is why it goes away when you unfold). I have a fix using generics for this func and the general hashing thing. But neither are the source of this bigger bug. I know roughly what it is, but I’m having review the whole code basically. I think there are subtle bugs all over.

More over the weekend.

On Fri, Aug 12, 2022 at 20:00 Ken Domino @.***> wrote:

Adding notes here on debugging the current issue with the original SimpleBoolean.g4 grammar. Anything related to actual recommendations on fixing Go outside of just this particular bug, I'll add to the new Discussion thread.

Actually, the bug is more complicated than I thought and seems to also occur in config set computation.

The input to the 4th call to "PredictionMode.GetConflictingAltSubsets(reach.configs)" is not the same across targets. It's different for Go. I only thought it was the same because cursoring over the variables showed sets with the same size, and most of the values the same. But VSCode does not call "String()" for the Dart code. So, I missed a difference in the config set data structure.

In CSharp, input "config.reaches" is

[(62,1,[31 11 $]), (64,1,[35 11 $]), (38,1,[11 $]), (41,1,[11 $]), (44,1,[11 $]), (51,1,[11 $]), (11,1,[$]), (62,2,[31 11 $]), (64,2,[35 11 $]), (38,2,[11 $]), (41,2,[11 $]), (44,2,[11 $]), (51,2,[11 $]), (11,2,[$])]

In Java, input is "config" (slightly different signature).

[(62,1,[31 11 $]), (64,1,[35 11 $]), (38,1,[11 $]), (41,1,[11 $]), (44,1,[11 $]), (51,1,[11 $]), (11,1,[$]), (62,2,[31 11 $]), (64,2,[35 11 $]), (38,2,[11 $]), (41,2,[11 $]), (44,2,[11 $]), (51,2,[11 $]), (11,2,[$])]

In Go, input is

[(62,1,[31 [11 $, 32 36 11 $, 36 11 $]]), (64,1,[35 [11 $, 32 36 11 $, 36 11 $]]), (38,1,[[11 $, 32 36 11 $, 36 11 $]]), (41,1,[[11 $, 32 36 11 $, 36 11 $]]), (44,1,[[11 $, 32 36 11 $, 36 11 $]]), (51,1,[[11 $, 32 36 11 $, 36 11 $]]), (11,1,[$]), (62,2,[31 [11 $, 32 11 $]]), (64,2,[35 [11 $, 32 11 $]]), (38,2,[[11 $, 32 11 $]]), (41,2,[[11 $, 32 11 $]]), (44,2,[[11 $, 32 11 $]]), (51,2,[[11 $, 32 11 $]]), (11,2,[$])]

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3718#issuecomment-1213037496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMAWJ6W3NTLRA3MQXY3VYY4HDANCNFSM5WGGAXKQ . You are receiving this because you were mentioned.Message ID: @.***>

kaby76 commented 2 years ago

JFK. Off by one:

Good Java: here
Bad Go: here
Good CSharp: here
Good Cpp: here
Good Dart: here
Bad JavaScript: here
Good Python3: here
PHP--no debug output.

Also, numerous confusions of "symbolic name" vs. "literal name" in the Antlr Go runtime code.

parrt commented 2 years ago

I'm going to start digging into the debugging output now to see about creating a machine readable version to improve the overall testing rig across targets. I should've done this a long time ago.

kaby76 commented 2 years ago

Well, this accounts for a ton of divergence in decision-making.

TURN_OFF_LR_LOOP_ENTRY_BRANCH_OPT

In Java code here
In Go code here
Missing from C# code here
In Dart code here
In Cpp code here
Missing in JavaScript code here
Missing in Python3 code here
Missing in PHP code here

And, after eliminating this divergence, it seems that GetConflictingAltSubsets may be working. The lists returned by the function are wildly different order between targets, but seem to have the same elements in the list.

In any case, it would be nice if this was made consistent across targets.

parrt commented 2 years ago

Interesting. Yes I put in that switch in case the optimization we made was incorrect. I’m guessing people simply assume this is not turned off but maybe some people are assuming the opposite?

kaby76 commented 2 years ago

Well, whatever it does, I added it locally to the C# runtime code, and I'm restarting the debug output diffs. Still looking for some big diff along the way to see if there's some problem with the computation with the golang target. Most looking really good, but now I see something else that looks bad..... :)

jimidle commented 2 years ago

Well, this accounts for a ton of divergence in decision-making.

TURN_OFF_LR_LOOP_ENTRY_BRANCH_OPT

In Java code here

In Go code here

Missing from C# code here

In Dart code here

In Cpp code here

Missing in JavaScript code here

Missing in Python3 code here

Missing in PHP code here

And, after eliminating this divergence, it seems that GetConflictingAltSubsets may be working. The lists returned by the function are wildly different order between targets, but seem to have the same elements in the list.

In any case, it would be nice if this was made consistent across targets.

This is to do with the collections - go does not guarantee the same order between runs and of course order will be different between languages. GetConflictingAltSubsets does have a bug as it is written due to hashing, but I have written some generic based stores and that bug is fixed as well as all the other map[int]s - now I am working on the closure stuff, which is more tricky as there is something wrong with the assumptions in go vs java I think. Should get there though. After that, it is diminishing returns I think - if this bug is fixed, then I think call it good and put the effort in to a new version of the runtime.

antlr / antlr4

Terrible Golang performance #3718