Closed kirides closed 1 year ago
The grammar has one big ambiguity, but it's probably not the problem. statement
can derive funcCall
and expression
, but expression
can derive value
, which can also derive funcCall
. It reads the entire statement just to decide which one to pick of funcCall
vs expression
.
removing funcCall
from Statement and nameNode
from expression didn't really do anything
I added code to separate the parse from the tree walker, since this has nothing to do with tree walking, and added code to time the parse call. I then ran the code with all 19 files in data/ vs only one file in data/, data/EngineAdr_G1.d, and compared the time performing the one parse of that file versus the parse of that file in an aggregate of 19 different parses. In the 19-file parse run, EngineAdr_G1.d ran in 3.500 s. When run alone, it ran in 0.564 s. (I also ran this against my templated-generated driver with the grammar, which runs only one parse per process run, and the parse for EngineAdr_G1.d was 0.5s--the same result.) That's a pretty significant difference. If anything, cached data from previous parses should improve the performance of subsequent parses. This says for Go, it's actually hurting. My conclusion is that either there is significant pressure on the memory allocator, or that cached parse data alters the parse for the Go target when compared to another target. I haven't checked internally the actual closure computations side-by-side between Go and CSharp or Java, but I will check that next. BTW folks, we really really need to get the debug output on ParserATNSimulator to look the same across targets. It is not currently. I mentioned this several times before. We need this fixed in order to trace identical operations across targets and to point out porting issues.
I modified the trgen templates to create drivers for CSharp and Go that parse multiple files, then tested how previous parses affect the timing of subsequent parses. For a parse of Buffs.d then EngineAdr_G1.d in CSharp, the parse of EngineAdr_G1.d is 5x faster than if parsing EngineAdr_G1.d alone. That is complete expected. But, for a parse of Buffs.d then EngineAdr_G1.d in Go, the parse of EngineAdr_G1.d is 3x slower! That is completely unexpected. Somehow, warm-up caching is not working. This is serious.
I turned on debugging for both CSharp and Go targets for Antlr 4.11.1, rebuilt the trgen-generated apps, and ran the application on the sequence of files "Buffs.d EngineAdr_G1.d", capturing the output. After extracting out key events from the output (such as "adaptivePredict ...", "testing ...", "DFA after predictATN ..."), and editing trivial strings out (sed 's/Const<34>/34/'
, ...), there are no differences in how the algorithms run.
It's likely that the data structures associated with caching are to blame, e.g., DoubleDict/map, etc., or Go memory management is to blame.
Inside the issues.zip are two profiling results, those can be analyzed using
go tool pprof -http=":8000" profile_xyz
it shows that a lot of the time is spend in equality checks. Sadly i don't know if the behavior is to be expected.
That profiling data might come in handy. What tool did you use so I can understand what it means? [Never mind--I see now.]
I'm pretty confident the algorithms are now working the same between the Go target and CSharp for 4.11.1. That's great news because prior to 4.11.1, it was very very different. Many important bugs found.
So caching is working, but it's not well.
The peak working memory usage for the Go target is roughly 2x the peak memory usage for the CSharp target. But, it's only ~50MB, which is basically nothing in a modest 16GB, AMD Ryzen 7 2700 system. So, resources are not the problem.
to collect the data i used a the pprof endpoint of my language server, very simple:
main.go, just copy into the issues dir where go.mod
is and run with go run main.go
EDIT: or just modify the test / create a new one, and execute go test -v -run=^TestParsing$ -timeout=30s -cpuprofile profile -count=1
package main
import (
"net/http"
_ "net/http/pprof"
)
func main() {
// pprof url: http://127.0.0.1:5000/debug/pprof
// direct url to profile: http://127.0.0.1:5000/debug/pprof/profile?seconds=10 (default is 30)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Start your work here
// start profile, then call http://127.0.0.1:5000/
})
http.ListenAndServe("127.0.0.1:5000", nil)
}
to analyze the profile data, run go tool pprof -http=":8000" profile
"To start tuning the Go program, we have to enable profiling. If the code used the Go testing package’s benchmarking support, we could use gotest’s standard -cpuprofile and -memprofile flags. In a standalone program like this one, we have to import runtime/pprof and add a few lines of code" https://go.dev/blog/pprof
Geezuz, doesn't Go have a stand-alone CPU sampling profiler of an executable program that doesn't require code modification of source code? I guess that's why this was written as a unit test application.
Geezuz, doesn't Go have a stand-alone CPU sampling profiler of an executable program that doesn't require code modification of source code? I guess that's why this was written as a unit test application
yes ^^'
EDIT: updated the post above and added "just modify the test / create a new one, and execute go test -v -run=^TestParsing$ -timeout=30s -cpuprofile profile -count=1
"
I modified the trgen driver to create cpu profiling, and it works--but you cannot call Exit(), otherwise you get a zero-length profiling output file.
$ ./Test.exe ../examples/*
HI THERE
Time: 0.044 s
Parse succeeded.
Time: 0.070 s
Parse succeeded.
Time: 0.134 s
Parse succeeded.
Time: 0.588 s
Parse succeeded.
Time: 0.090 s
Parse succeeded.
Time: 0.271 s
Parse succeeded.
Time: 1.190 s
Parse succeeded.
Time: 0.436 s
Parse succeeded.
Time: 1.052 s
Parse succeeded.
Time: 0.170 s
Parse succeeded.
Time: 0.173 s
Parse succeeded.
Time: 1.050 s
Parse succeeded.
Time: 4.091 s
Parse succeeded.
Time: 2.649 s
Parse succeeded.
Time: 0.363 s
Parse succeeded.
Time: 0.789 s
Parse succeeded.
Time: 1.791 s
Parse succeeded.
Time: 0.339 s
Parse succeeded.
Time: 0.216 s
Parse succeeded.
Total Time: 15.517 s
Kenne@DESKTOP-DL44R7B MINGW64 ~/issue-3934/Go/Generated
$ ls -lts
total 186215
20 -rw-r--r-- 1 Kenne Kenne 18561 Oct 20 07:07 cpu.prof
5492 -rwxr-xr-x 1 Kenne Kenne 5623808 Oct 20 07:06 Test.exe*
8 -rw-r--r-- 1 Kenne Kenne 5751 Oct 20 07:06 Test.go
240 -rw-r--r-- 1 Kenne Kenne 242780 Oct 20 04:15 s3.txt
1608 -rw-r--r-- 1 Kenne Kenne 1645089 Oct 20 04:02 ss.txt
152 -rw-r--r-- 1 Kenne Kenne 153644 Oct 20 03:49 s.txt
129652 -rw-r--r-- 1 Kenne Kenne 132760394 Oct 20 03:38 sequenced.txt
49028 -rw-r--r-- 1 Kenne Kenne 50204596 Oct 20 03:37 single.txt
4 drwxr-xr-x 1 Kenne Kenne 0 Oct 19 14:06 parser/
1 -rw-r--r-- 1 Kenne Kenne 468 Oct 19 14:06 go.sum
1 -rw-r--r-- 1 Kenne Kenne 207 Oct 19 14:06 go.mod
4 -rw-r--r-- 1 Kenne Kenne 1587 Oct 19 14:06 tester.psm1
1 -rw-r--r-- 1 Kenne Kenne 618 Oct 19 14:06 test.sh
4 -rw-r--r-- 1 Kenne Kenne 1062 Oct 19 14:06 makefile
Kenne@DESKTOP-DL44R7B MINGW64 ~/issue-3934/Go/Generated
$ !go
go tool pprof cpu.prof
Type: cpu
Time: Oct 20, 2022 at 7:07am (EDT)
Duration: 15.64s, Total samples = 10.72s (68.53%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top100
Showing nodes accounting for 10s, 93.28% of 10.72s total
Dropped 117 nodes (cum <= 0.05s)
flat flat% sum% cum cum%
1.60s 14.93% 14.93% 2.32s 21.64% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseATNConfig).Equals
1.57s 14.65% 29.57% 1.75s 16.32% runtime.(*itabTableType).find
1.21s 11.29% 40.86% 4.67s 43.56% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNConfig).Equals
1.06s 9.89% 50.75% 7.42s 69.22% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseATNConfigSet).Compare
0.74s 6.90% 57.65% 2.49s 23.23% runtime.getitab
0.53s 4.94% 62.59% 1.14s 10.63% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerActionExecutor).Equals
0.48s 4.48% 67.07% 2.99s 27.89% runtime.convI2I
0.44s 4.10% 71.18% 9s 83.96% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*DFAState).Equals
0.43s 4.01% 75.19% 7.85s 73.23% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseATNConfigSet).Equals
0.40s 3.73% 78.92% 0.61s 5.69% golang.org/x/exp/slices.EqualFunc[...]
0.22s 2.05% 80.97% 5.02s 46.83% github.com/antlr/antlr4/runtime/Go/antlr/v4..Get
0.17s 1.59% 82.56% 4.56s 42.54% github.com/antlr/antlr4/runtime/Go/antlr/v4..Put
0.16s 1.49% 84.05% 0.16s 1.49% runtime.stdcall1
0.15s 1.40% 85.45% 0.15s 1.40% runtime.add (inline)
0.14s 1.31% 86.75% 0.30s 2.80% runtime.scanobject
0.12s 1.12% 87.87% 0.12s 1.12% runtime.stdcall2
0.09s 0.84% 88.71% 0.21s 1.96% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerActionExecutor).Equals.func1
0.09s 0.84% 89.55% 9.10s 84.89% github.com/antlr/antlr4/runtime/Go/antlr/v4..Equals2
0.07s 0.65% 90.21% 0.07s 0.65% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseATNState).GetStateNumber
0.06s 0.56% 90.76% 0.06s 0.56% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseLexerAction).Equals
0.06s 0.56% 91.32% 0.06s 0.56% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerSkipAction).Equals
0.05s 0.47% 91.79% 0.07s 0.65% runtime.findObject
0.05s 0.47% 92.26% 0.50s 4.66% runtime.gcDrain
0.02s 0.19% 92.44% 0.20s 1.87% runtime.mallocgc
0.02s 0.19% 92.63% 0.06s 0.56% runtime.newstack
0.01s 0.093% 92.72% 9.60s 89.55% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*CommonTokenStream).fetch
0.01s 0.093% 92.82% 0.07s 0.65% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).getReachableConfigSet
0.01s 0.093% 92.91% 0.14s 1.31% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).closureCheckingStopState
0.01s 0.093% 93.00% 0.13s 1.21% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).closureWork
0.01s 0.093% 93.10% 0.17s 1.59% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).execATNWithFullContext
0.01s 0.093% 93.19% 0.06s 0.56% runtime.mapassign_fast64
0.01s 0.093% 93.28% 0.06s 0.56% runtime/pprof.(*profMap).lookup
0 0% 93.28% 5.57s 51.96% example.com/myparser/parser.(*DaedalusParser).BlockDef
0 0% 93.28% 0.32s 2.99% example.com/myparser/parser.(*DaedalusParser).ClassDef
0 0% 93.28% 9.81s 91.51% example.com/myparser/parser.(*DaedalusParser).DaedalusFile
0 0% 93.28% 0.32s 2.99% example.com/myparser/parser.(*DaedalusParser).ElseBlock
0 0% 93.28% 0.39s 3.64% example.com/myparser/parser.(*DaedalusParser).ElseIfBlock
0 0% 93.28% 0.09s 0.84% example.com/myparser/parser.(*DaedalusParser).ExpressionBlock
0 0% 93.28% 0.06s 0.56% example.com/myparser/parser.(*DaedalusParser).FuncArgExpression
0 0% 93.28% 0.07s 0.65% example.com/myparser/parser.(*DaedalusParser).FuncCall
0 0% 93.28% 3.02s 28.17% example.com/myparser/parser.(*DaedalusParser).FunctionDef
0 0% 93.28% 1.54s 14.37% example.com/myparser/parser.(*DaedalusParser).IfBlock
0 0% 93.28% 1.76s 16.42% example.com/myparser/parser.(*DaedalusParser).IfBlockStatement
0 0% 93.28% 4.05s 37.78% example.com/myparser/parser.(*DaedalusParser).InlineDef
0 0% 93.28% 0.22s 2.05% example.com/myparser/parser.(*DaedalusParser).Statement
0 0% 93.28% 3.02s 28.17% example.com/myparser/parser.(*DaedalusParser).StatementBlock
0 0% 93.28% 0.08s 0.75% example.com/myparser/parser.(*DaedalusParser).Value
0 0% 93.28% 0.09s 0.84% example.com/myparser/parser.(*DaedalusParser).expression
0 0% 93.28% 0.08s 0.75% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseATNConfigSet).Add
0 0% 93.28% 9.59s 89.46% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseLexer).NextToken
0 0% 93.28% 9.59s 89.46% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseLexer).safeMatch
0 0% 93.28% 9.40s 87.69% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseParser).Consume
0 0% 93.28% 0.20s 1.87% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseParser).EnterRule
0 0% 93.28% 9.40s 87.69% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*BaseParser).Match
0 0% 93.28% 9.41s 87.78% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*CommonTokenStream).Consume
0 0% 93.28% 0.19s 1.77% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*CommonTokenStream).LT
0 0% 93.28% 9.60s 89.55% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*CommonTokenStream).Sync
0 0% 93.28% 0.19s 1.77% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*CommonTokenStream).lazyInit (inline)
0 0% 93.28% 0.19s 1.77% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*CommonTokenStream).setup
0 0% 93.28% 9.59s 89.46% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).Match
0 0% 93.28% 9.51s 88.71% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).addDFAEdge
0 0% 93.28% 9.48s 88.43% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).addDFAState
0 0% 93.28% 0.06s 0.56% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).closure
0 0% 93.28% 9.58s 89.37% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).computeTargetState
0 0% 93.28% 9.59s 89.46% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*LexerATNSimulator).execATN
0 0% 93.28% 0.21s 1.96% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).AdaptivePredict
0 0% 93.28% 0.14s 1.31% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).closure (inline)
0 0% 93.28% 0.13s 1.21% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).computeReachSet
0 0% 93.28% 0.21s 1.96% github.com/antlr/antlr4/runtime/Go/antlr/v4.(*ParserATNSimulator).execATN
0 0% 93.28% 9.81s 91.51% main.DoParse
0 0% 93.28% 9.81s 91.51% main.ParseFilename
0 0% 93.28% 9.81s 91.51% main.main
0 0% 93.28% 0.13s 1.21% runtime.(*gcControllerState).enlistWorker
0 0% 93.28% 0.14s 1.31% runtime.(*gcWork).balance
0 0% 93.28% 0.15s 1.40% runtime.findRunnable
0 0% 93.28% 0.37s 3.45% runtime.gcBgMarkWorker
0 0% 93.28% 0.50s 4.66% runtime.gcBgMarkWorker.func2
0 0% 93.28% 0.10s 0.93% runtime.lock (inline)
0 0% 93.28% 0.10s 0.93% runtime.lock2
0 0% 93.28% 0.10s 0.93% runtime.lockWithRank (inline)
0 0% 93.28% 9.81s 91.51% runtime.main
0 0% 93.28% 0.26s 2.43% runtime.mcall
0 0% 93.28% 0.11s 1.03% runtime.newobject (partial-inline)
0 0% 93.28% 0.14s 1.31% runtime.notewakeup
0 0% 93.28% 0.26s 2.43% runtime.park_m
0 0% 93.28% 0.13s 1.21% runtime.preemptM
0 0% 93.28% 0.13s 1.21% runtime.preemptone
0 0% 93.28% 0.12s 1.12% runtime.resetspinning
0 0% 93.28% 0.27s 2.52% runtime.schedule
0 0% 93.28% 0.10s 0.93% runtime.semasleep
0 0% 93.28% 0.14s 1.31% runtime.semawakeup
0 0% 93.28% 0.13s 1.21% runtime.startm
0 0% 93.28% 0.55s 5.13% runtime.systemstack
0 0% 93.28% 0.13s 1.21% runtime.wakep
0 0% 93.28% 0.06s 0.56% runtime/pprof.(*profileBuilder).addCPUData
0 0% 93.28% 0.07s 0.65% runtime/pprof.profileWriter
(pprof)
I think a Hash1()
function for generics isn't quite right. I'm seeing that a Hash1() for an ATNConfig for generics is missing the addition of the hash value for "context" used in Hash() for a BaseATNConfig. I don't understand why it just doesn't returno.Hash()
as that will call the hash function for object o
. A similar argument could be made for other Hash1() functions, e.g., this and Equals2(), e.g., this. Why is it duplicating these functions--and apparently not even correctly?
If a hash value collides excessively, it will have to be resolved by Equals().
Ah, it's a 2-level hash table. Not as simple as I thought.
I recall that preliminary versions of antlr4 were extremely slow precisely because the hash function wasn’t distributing hash keys well enough. Since the # of calls to closure() is correct and a lot of time is spent in equals() I’d certainly recommend monitoring the size of the hash table buckets. If they grow, it would show a defect in the hash function Envoyé de mon iPhoneLe 20 oct. 2022 à 20:13, Ken Domino @.***> a écrit : Ah, it's a 2-level hash table. Not as simple as I thought.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks. I will keep an eye on hash table distributions. But, I need to convince myself first that JStore is really working. The Go target implements JStore[] as a replacement for a HashSet<> in Java in several places. I thought it a two-level tree, but it is not because there are no hash and equals functions for the second level. (The "1" and "2" in Hash1() and Equals2() appear to just be artifacts, not any significance.) When I replace the Equals2() and Hash1() functions with computations that use "context", the "configs" look different from previous runs of the Go target executable, and not just in order. (Note, there is no output of "configs" in the CSharp target, only in the Go target, so I can't compare with CSharp--which is why I've been saying we should get the debug trace between target to look the same. Yes, sort sets so they appear identical when printing.) That shouldn't happen. How can one say that an BaseATNConfig is equal to another BaseATNConfig if BaseATNConfig contains a "context", yet the comparison function Equals2() does not include "context" in the comparison? That does not make sense. Now, some of these JStore instances use the "object" comparison, and that defaults to the usual Hash() and Equals() methods.
Right. We got a problem here. I'm printing out the second parameter to the closureCheckingStopState() function, at these points in targets Java, CSharp, Go. It's identical for Java and CSharp, but for Go there's a difference in the second parameter, which is an ATNConfigSet. And, I think it's because JPoint isn't defining Hash1() and Equals2() correctly. Here's where it diverges for the first time with input Buffs.d.
Java and CSharp
configs([(117,1,[$]),
(104,2,[[121 113 $, 136 113 $]]),
(147,2,[130 113 $]),
(169,2,[130 113 $]),
(183,2,[130 113 $]),
(190,2,[130 113 $])])
Go
configs([(117,1,[$]),
(104,2,[121 113 $]),
(147,2,[130 113 $]),
(169,2,[130 113 $]),
(183,2,[130 113 $]),
(190,2,[130 113 $]),
(104,2,[136 113 $])])
Right. We got a problem here. I'm printing out the second parameter to the closureCheckingStopState() function, at these points in targets Java, CSharp, Go. It's identical for Java and CSharp, but for Go there's a difference in the second parameter, which is an ATNConfigSet. And, I think it's because JPoint isn't defining Hash1() and Equals2() correctly. Here's where it diverges for the first time with input Buffs.d.
Java and CSharp configs([(117,1,[$]), (104,2,[[121 113 $, 136 113 $]]), (147,2,[130 113 $]), (169,2,[130 113 $]), (183,2,[130 113 $]), (190,2,[130 113 $])]) Go configs([(117,1,[$]), (104,2,[121 113 $]), (147,2,[130 113 $]), (169,2,[130 113 $]), (183,2,[130 113 $]), (190,2,[130 113 $]), (104,2,[136 113 $])])
It is possible that the hash is slightly different in Go, I will take another look. When I redid these I had to trace every hash in the Java runtime to make sure that the same hash was being calculated in Go - prior to the generics implementation, the hashes had quite a few things wrong with them and in some cases were just incorrect.
I will take another look. The distribution could be different though as I use a Go map as the underlying storage in most cases. By design, the hash in to the map will be different every time the code runs, so don't get caught by that :)
I have time blocked out my Sunday for ANTLR work, so I will also take a look at this. It is entirely possible that I misread one of the hash paths. HOwever, I don't think that there should be any performance problems because of this, unless somehow I am hashing everything in to the saem bucket. This was the issues with one of the performance reports before I changed all the hashing. But I thought that these had all been fixed. Maybe not! :)
Can I also make sure that you are using the /v4 code? I think that tomorrow I will lose the v1 code, as we know that the v4 path is good now. This will avoid any confusion.
I am definitely using only /v4 code. But there seems to be a problem after reading prediction_context.go.
Some of the calls to add in ATNConfig's seem to be returning wrong values, e.g., here. When I dive into Add() and merge(), I'm seeing code that was recently changed, but then changed again shortly afterwards. https://github.com/antlr/antlr4/commit/289ea358be9a7a446a3227bbda61c87e467937e1#diff-9b169f021623581756abb3a7e6163ac9cda1c7ef73c1a29b3ebe57bcb5fcefc1R382 But, this code was not merged back to 4.11.1. Shouldn't it be merged to 4.11.1?
At this point, I'm going to move forward to the "dev" branch for Go. This seems partly responsible for the problems I am seeing.
@jimidle can chime in when he has time here.
FYI, the perf problem still occurs with the "dev" branch (go get github.com/antlr/antlr4/runtime/Go/antlr/v4@dev
). I'll continue to look at the "configs" set, which I mentioned above, to see if--or why--the sets are different between Java, CSharp, and Go. It's probably unrelated, but who knows.
Yes, the "dev" branch contains important changes for the Go target. The "configs" sets are now in agreement between Java, CSharp, and Go. So, that difference has been resolved.
Is it one PR or commit? I'm in process of integrating into google repo at moment and I should include latest Go stuff.
Is it one PR or commit? I'm in process of integrating into google repo at moment and I should include latest Go stuff.
https://github.com/antlr/antlr4/commit/289ea358be9a7a446a3227bbda61c87e467937e1
These changes are important. Without it, some sets are not computed correctly (in 4.11.1 "master"). There may be other stuff--I've been testing against the "dev" branch.
Yes, the "dev" branch contains important changes for the Go target. The "configs" sets are now in agreement between Java, CSharp, and Go. So, that difference has been resolved.
I thought it did. HOwever, I will still do some checking. I think I have missed one change somehow, maybe in a merge. There is also sone work to be done in error recovery. Java changed a little bit on where it inserts tokens - ironically the bits that use "Jim Idle's magic recovery..." (copied from Ter's comments ;) - but Go wasn't updated. Then I can turn on the tests that are currently disabled for Go. I also know that I need to rework config set "inheritance" because it is causing way too many allocations in one area because comparisons fail, even though I fixed the comparison logic.
ok, now's a good time to get a clean patch file from 4.11.1 for Go as I'm trying to integrate it now into google's repo and it's a lot of work; don't want to repeat in 3 weeks haha.
I am working on this today. Let's see how far I can get. We should be advocating for Google to use modules really, this will help you with the integration.
They have a big mono repo so versions aren't an issue (I understand)
Yes - I am aware that Google use a massive mono repo as it has been written about many times of course. But I don’t see the value of putting third party code in to it, especially with Go, which is all about modularity etc. Actually, I think mono repos are just not a good idea - but Google can afford to have a huge team writing tools and so on to deal with it I guess.
On Oct 24, 2022 at 11:04:23 AM, Terence Parr @.***> wrote:
They have a big mono repo so versions aren't an issue (I understand)
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3934#issuecomment-1288350316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMFGXTDEFZT4AJQYVBLWEX4DPANCNFSM6AAAAAARH7SKQ4 . You are receiving this because you were mentioned.Message ID: @.***>
Just a note: I have been updating trgen, which generates drivers. I noticed that the input files for this issue are encoded in iso-8859-1, not UTF-8 or ASCII. The Antlr Cpp target does not have any way to pass the encoding of the file to ANTLRFileStream, unlike most other targets, e.g., for Java, and the Cpp target is missing CharStreams. I don't know if this is by design, or an oversight. But, without it, there is no direct comparison available of the Cpp against the other targets, without first converting the file encoding to UTF-8.
@kaby76 in the actual sourcecode in Go the files are transformed in memory from ISO-8859-1 to UTF-8 for analyzing purpose. seems i missed to convert them prior zipping the example, though they mostly only contains ASCII anyways, except for some äöüÄÖÜ
Another note: Daedalus.g4 has a symbol definition for "Class". Beyond the fact that we don't have a release of the PHP runtime for 4.11.1, it looks like "Class" is a reserved word in PHP. We get a symbol conflict when trying to execute the generated PHP code. It looks like the symbol conflict table isn't completely up to snuff for the tool. The usual workaround is to append an underscore ('_') to the name. There may be other symbol conflicts, so this would be a good test case.
Geez, even PHP is faster than Go at parsing all "*.d" files (~3.5s). But, not as fast as Java, ~0.8s. Go really has "absymal performance".
Actually, while PHP is faster than GO (!!), PHP isn't that fast compared to say Java. Passing the file AIFunction.d without conversion to utf-8-no-bom lexed with one token: EOF. No errors. Everything has to be converted to utf-8 because targets handle the encoding inconsistently (`for i in .d; do echo $i; echo ${i%.}; iconv -f iso-8859-1 -t utf-8 $i -o ${i%.*}.d; done`). After fixing that, the time for PHP is ~9s.
@kaby76 It looks like @marcospassos just pushed out 4.11.1 for PHP :)
@kaby76, I'm not sure why you're so astonished about PHP's performance. We've been using it in production for two years now, and based on our benchmarks, it's one of the fastest ANTLR runtimes when properly used. PHP has a just-in-time compiler and can run with opcodes, skipping all the interpreting phases.
We have pretty complex grammar, and the ANTLR-backed PHP parser can handle 5k files in < 15 seconds with zero optimization using a regular MacBook.
I plan to invest some time in identifying bottlenecks over the next few months to make the PHP runtime as fast as the Java runtime. PHP 8.1 introduced native support for murmur hash, which allows new performance improvements.
It took a bit of time, but I rewrote the templates for generating drivers for parsing for Trash's trgen. The drivers output timing information in seconds in tabular form, completely standardized across targets so it can be analyzed using statistic software. With this change, I wanted to see how warm-up helps (or hurts) for Daedalus.g4, and the ".d" files in alphanumeric order of the file name.
I tested the inputs against the grammar 10 times and collected the timing data. I then generated scatter plots of the run time versus the "*.d" files in alphanumeric order ("0" being the first file, "15" the last file, to parse). The graphs were generated using R and plot,,converted to svg via epstopdf/pdf2svg, fixed and labeled in Inkscape. (Sorry for the crude plots, but I am quite rusty using R.)
Note, the Python3 graph is missing because at the time I was having a problem with a Python3 bug with global variables. I included testing of the several-year-old Antlr4cs fork runtime and tool.
Here are these graphs.
First, I like to point out the best runtime performance is Dart. It was 2 times faster than Java, which is very acceptabled in terms of performance. The worse performance is Go. But, tied for that is the PHP runtime, which is ~10-15 times slower generally than Java.
I want to call attention to the general trend of faster run times as more files are parsed. This happens for all targets except Go, which is counterintuitive.
The Cpp run times are much slower than I would expect, but the build for it is "Debug". I expect it to be much faster for "Release". (This was run on Windows 11, VS2022 C++.)
Note, the downward trends in runtimes may be an artifact, e.g., earlier files are longer in length than later files. I should reverse the order of the input files.
Well, we have not had a lot of time yet on optimizing Go. But it is a lot better than it was ;). Needs some “object” revamping. I don’t think the hashing is an issue now.
On Oct 28, 2022 at 11:46:30 PM, Ken Domino @.***> wrote:
Geez, even PHP is faster at parsing all "*.d" files than Go (~3.5s). But, not as fast as Java, ~0.8s.
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3934#issuecomment-1295157006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMDGI4CGHVWAQ3W5DL3WFPYNNANCNFSM6AAAAAARH7SKQ4 . You are receiving this because you were mentioned.Message ID: @.***>
Well, @kaby76, timing across language targets is pretty tricky, though I know you are very experienced.
I have some new data and figures of how warm-up affects parse times. It's not as clear-cut as I thought.
I ran a parse of each input file by itself, and a separate run of the input file after warm-up consisting of all but the input file. trgen outputs the parse time for each file, but I'm only interested in the time for parsing the file by itself, and the file after warm-up. I ran this ten times for each of the 9 targets that I implement drivers. In the following bar charts, there are 9 groupings (for each target), with the first bar in the group (colored blue) representing the time for parsing the input file without warm-up, and the second bar in the group (colored red) representing the time for parsing the input file with warm-up. All targets are placed on one graph so as to allow you to easily compare the parse times across targets.
For all input, warm-up caused a performance degradation for the Go target. But, we also see that warm-up does not help in some situations for other targets, such as PHP and Python3 with input file focusnames.d. As before, the Dart target was usually the fastest, followed very closely by the Cpp target. The worse target was usually PHP, with Python3 running slightly faster.
(Graphs produced by Octave; octave.txt. Raw data: out2.txt)
@kaby76, can you share a repo with the benchmark script? This could help us to improve the targets.
@kaby76, can you share a repo with the benchmark script? This could help us to improve the targets.
The Bash script is here.
That script is not in any repo yet. It does require the latest changes in trgen. Templates here have been updated for testing with Windows. It should still work with Linux but I haven't checked yet. (trgen is used in grammars-v4 Github Actions for testing across all OSes and targets.) I am still updating trgen for other reasons, e.g., "scenarios" in antlr4test-maven-plugin, so there's no release of the recent changes.
I'm using the latest Dev branch code for the Go target and Master for CSharp. This code in Go does not seem to work: https://github.com/antlr/antlr4/blob/5a68af0109d911e3bcd2e2c0eaaaecf25dbee66c/runtime/Go/antlr/prediction_context.go#L94
I noticed that for EngineAdrG1.d, "p.cache" is huge. Debugging this, the first breakpoint at this line returns nil--correctly because the p.cache map in empty. It then enters the "ctx" value, which is 113 $
. So far, so good. The next occurence at this breakpoint tries to also enter the context 113 $
. But, this time, p.cache has one entry. It should find the context, but it does not. It is entered again into "p.cache"--which is an error.
In C#, the cached value found here is not null. It never enters the same context twice, unlike for the Go target.
Thanks for your in-depth analysis, @kaby76 Yep, I started down the path of creating a standard ATN simulator dump so we could compare across targets. Or maybe all I did was create an issue. haha.
By the way when it comes to Java, warming up the JIT is also an issue not just warming up the ATN cache. I have some fairly end up performance monitoring stuff for Java if you want to take a look: https://github.com/antlr/performance. I would be surprised if Java didn't win overall, after warm up of the JIT, given the optimization Sam and I did years back. C++ should be able to win but non-Java targets were built as translations of the Java code which is not necessarily idiomatic for those languages.
Folks, If I read this code, the data structure Dictionary<PredictionContext, PredictionContext>, along with the code that enters into cache ctx, is just another name for a HashSet
Thanks for your in-depth analysis, @kaby76 Yep, I started down the path of creating a standard ATN simulator dump so we could compare across targets. Or maybe all I did was create an issue. haha.
Yes, yes! Please implement ATN dump.
By the way when it comes to Java, warming up the JIT is also an issue not just warming up the ATN cache. I have some fairly end up performance monitoring stuff for Java if you want to take a look: https://github.com/antlr/performance. I would be surprised if Java didn't win overall, after warm up of the JIT, given the optimization Sam and I did years back. C++ should be able to win but non-Java targets were built as translations of the Java code which is not necessarily idiomatic for those languages.
I thought that I did see warmup not helping for Java in one or two files with this grammar. But, the time for the parse was so fast, it didn't seem to be that important.
I'm also wondering whether one can experimentally find the minimum set of inputs whereby no further help. It doesn't seem that I have to parse so many input files to achieve good warmup.
Why in the world isn't this just a hash set?
That is an excellent question and I could've answered immediately over a decade ago but it's fuzzy now. Certainly we had to come up with a very special hash function. Hmm...It looks like there is a hashtable there:
public class PredictionContextCache {
protected final Map<PredictionContext, PredictionContext> cache =
new HashMap<PredictionContext, PredictionContext>();
...
Do you mean for a different cache? It's definitely not napping 1-to-n. It is likely 1-to-1 and x to x.
cache.put(ctx, ctx);
A set wouldn’t let you get the existing PredictionContext for an equal PredictionContext i.e. we need the get function
Le 3 nov. 2022 à 22:00, Terence Parr @.***> a écrit :
Why in the world isn't this just a hash set?
That is an excellent question and I could've answered immediately over a decade ago but it's fuzzy now. Certainly we had to come up with a very special hash function. Hmm...It looks like there is a hashtable there:
public class PredictionContextCache { protected final Map<PredictionContext, PredictionContext> cache = new HashMap<PredictionContext, PredictionContext>(); ... Do you mean for a different cache? It's definitely not napping 1-to-n. It is likely 1-to-1 and x to x.
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3934#issuecomment-1302657035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJASXFJTVUG4VRIO4BLWGQRWPANCNFSM6AAAAAARH7SKQ4. You are receiving this because you commented.
A set wouldn’t let you get the existing PredictionContext for an equal PredictionContext i.e. we need the get function
You are right. For example, in C#, a HashSet<> does not have a "Find()" function, only a test function "Contains()". But, for Cpp, HashSet<> => std::unordered_set--the only runtime that actually uses a hash set for the PrectionContext cache--the hash set does have a find(), so it is okay. For Go, JStore does have a Get(), which might be a better choice at this point. But, the type hierarchy for PrectionContext in Go is very gnarly, and not in a good way, which makes modifications not easy.
Since most set implementations are backed by hash maps, I don’t foresee any benefit in tweaking the existing implementations
Le 4 nov. 2022 à 12:03, Ken Domino @.***> a écrit :
A set wouldn’t let you get the existing PredictionContext for an equal PredictionContext i.e. we need the get function
You are right. For example, in C#, a HashSet<> does not have a "Find()" function, only a test function "Contains()". But, for Cpp, HashSet<> => std::unordered_set https://github.com/antlr/antlr4/blob/aa1f1f12a846846fcb78544bdb2ad135471376db/runtime/Cpp/runtime/src/atn/PredictionContextCache.h#L58, the hash set does have a find() https://github.com/antlr/antlr4/blob/aa1f1f12a846846fcb78544bdb2ad135471376db/runtime/Cpp/runtime/src/atn/PredictionContextCache.cpp#L40, so it is okay. For Go, JStore does have a Get() https://github.com/antlr/antlr4/blob/aa1f1f12a846846fcb78544bdb2ad135471376db/runtime/Go/antlr/jcollect.go#L79, which might be a better choice at this point. But, the type hierarchy for PrectionContext in Go is very gnarly, and not in a good way, which makes modifications not easy.
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3934#issuecomment-1303271225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJHVKZYY4VOF4WELGVTWGTURLANCNFSM6AAAAAARH7SKQ4. You are receiving this because you commented.
Since most set implementations are backed by hash maps, I don’t foresee any benefit in tweaking the existing implementations
I agree, except for maybe the Go target, which doesn't seem to be working as it contains duplicate PredictionContext.
But, as we're on this code, I just found out that for the Cpp target, the javascript/javascript fails just in PredictionContext comparison! See https://github.com/antlr/grammars-v4/issues/2909#issuecomment-1303832945
go get -u github.com/antlr/antlr4/runtime/Go/antlr/v4@d5dd253
(same with plain/v4
)smallest possible grammar and code that reproduces the behavior
Expected
Parsing 19 files should take around 0.2s
Actual
Parsing 19 files takes more than 11s, and exponentially more the more files there are
Happens as of https://github.com/antlr/antlr4/commit/f19dc0ea3ede67483b6aec6128d5dd44053778f9 (~5.5s) Got worse with 56df72ca6742 (~10s)
See example in attached zip: (includes sample files and grammar) issue.zip
To test old and new behaviour, just toggle the commented lines in
go.mod
and rungo mod tidy && go test -timeout 30s -run ^TestParsing$ issue -count=1
EDIT: after removing all leading conditional values (summary comments) and putting them into a hidden channel, performance was still a bit lower (~20%) but much, MUCH better than whatever that issue is