Closed MikeMitchellWebDev closed 5 months ago
It's hard to tell what's going wrong without a way to reproduce or see what myprogram
is doing. Are you able to share the code or some code that consistently reproduces the issue? FWIW, to my knowledge, a dramatic spike like this hasn't been reported before (in production or even a test), which makes me curious about what the program is actually doing.
Note that generally, it is possible for the program to overrun the heap goal by at least a little bit. However that overrun is typically visible in the gctrace
output (Y
in X->Y->Z
), whereas this appears to be a change in heap goal that comes from apparently nowhere, akin to if SetGCPercent
was called in between GC 121 and GC 122. (Not saying that's what happened, just providing an example of how such output might show up.)
I can't share code but I can tell you exactly what I did. I used the go-git library to read the linux repository into memory (it has more than 1 million commits). Furthermore, this issue happened each time I did it. I tried it again several times when I noticed the spike, so it wasn't just a random occurrence. It should be said that it's totally unnecessary to set GOGC=3 to read the repo into memory; it actually performed much better using the default GOGC.
Here's the rest of the gctrace output from gc 123. You can see the heap goal continues incrementing by 3% after that one dramatic spike.
My computer has 8GiB of RAM and GOMAXPROCS was 4.
gc 123 @14.599s 1%: 0.085+4.7+0.045 ms clock, 0.34+0.44/3.7/7.6+0.18 ms cpu, 877->878->856 MB, 879 MB goal, 0 MB stacks, 0 MB globals, 4 P
pacer: assist ratio=+1.095070e+000 (scan 3 MB in 879->881 MB) workers=1++0.000000e+000
pacer: 28% CPU (25 exp.) for 2600392+15024+488544 B work (3035336 B exp.) in 922007824 B -> 922351632 B (∆goal -2428012, cons/mark +3.051929e-001)
gc 124 @14.905s 1%: 0.075+5.2+0.058 ms clock, 0.30+0.71/5.1/3.8+0.23 ms cpu, 879->879->857 MB, 881 MB goal, 0 MB stacks, 0 MB globals, 4 P
pacer: assist ratio=+1.095246e+000 (scan 3 MB in 881->883 MB) workers=1++0.000000e+000
pacer: 25% CPU (25 exp.) for 2668848+21304+488544 B work (3103960 B exp.) in 923814224 B -> 924136768 B (∆goal -2511486, cons/mark +3.051929e-001)
gc 125 @15.199s 1%: 0.083+9.1+0.10 ms clock, 0.33+0.12/8.9/8.3+0.42 ms cpu, 881->881->859 MB, 883 MB goal, 0 MB stacks, 0 MB globals, 4 P
pacer: assist ratio=+1.093189e+000 (scan 3 MB in 882->884 MB) workers=1++0.000000e+000
pacer: 25% CPU (25 exp.) for 2748424+15720+488544 B work (3178696 B exp.) in 924900032 B -> 925290176 B (∆goal -2517584, cons/mark +3.051929e-001)
gc 126 @15.429s 1%: 0.10+5.7+0.064 ms clock, 0.41+0.16/4.8/5.7+0.25 ms cpu, 882->882->860 MB, 884 MB goal, 0 MB stacks, 0 MB globals, 4 P
pacer: assist ratio=+1.094132e+000 (scan 3 MB in 883->886 MB) workers=1++0.000000e+000
pacer: 25% CPU (25 exp.) for 2930792+15736+488544 B work (3252688 B exp.) in 926862544 B -> 927727544 B (∆goal -2107847, cons/mark +3.051929e-001)
@seankhliao According to Github, it seems like the WaitingForInfo label was added at the same time as I provided the requested info.
Given the additional logs, I am skeptical that this is a GC bug. My current hypothesis is that your application is making a single ~500 MiB allocation (or maybe a handful of very large, ~100 MiB allocations concurrently). It does appear that the GC thinks your live heap is actually 856 MiB in size. If this was some kind of strange overrun bug, I would expect the following GC to realize that the vast majority of that memory isn't actually needed and drop it immediately. From that perspective, GOGC
is completely working as intended. The jump appears to happen precisely because your application just needs that much more memory, period. The single large allocation would also explain why the heap doesn't grow smoothly and instead simply jumps.
As a result, I suspect there isn't actually anything actionable here. I'd recommend taking a look at a heap profile (the inuse_space
sample_index
) to confirm this, but I think I'm going to leave the WaitingForInfo label up so that the issue automatically closes out unless we can either rule this out, or we get some more corroborating evidence that this is actually a runtime issue.
I took a profile. Here are the inuse_space and alloc_space
That profile appears to have been taken at the end of program execution, since the inuse_space
profile has very little in it. You'd need to acquire a profile when the heap actually grows.
Here you go. I think it just confirms everything you've said thus far. The dramatic spike occurs at the beginning of the application.
gc 119 @3.304s 1%: 0.097+1.9+0.006 ms clock, 0.38+0.095/1.8/2.8+0.026 ms cpu, 350->351->343 MB, 352 MB goal, 0 MB stacks, 0 MB globals, 4 P gc 120 @3.362s 1%: 0.062+1.6+0.018 ms clock, 0.24+0.092/1.5/3.0+0.072 ms cpu, 350->352->343 MB, 353 MB goal, 0 MB stacks, 0 MB globals, 4 P gc 121 @3.419s 1%: 0.11+2.3+0.005 ms clock, 0.44+0.37/2.1/3.1+0.021 ms cpu, 351->351->343 MB, 354 MB goal, 0 MB stacks, 0 MB globals, 4 P gc 122 @3.451s 1%: 0.091+1.7+0.017 ms clock, 0.36+0.096/1.5/2.6+0.069 ms cpu, 859->859->853 MB, 859 MB goal, 0 MB stacks, 0 MB globals, 4 P gc 123 @8.516s 0%: 0.32+6.5+0.023 ms clock, 1.2+1.6/4.3/8.1+0.093 ms cpu, 875->876->856 MB, 879 MB goal, 0 MB stacks, 0 MB globals, 4 P gc 124 @8.744s 0%: 0.12+20+0.006 ms clock, 0.49+1.6/19/16+0.025 ms cpu, 876->876->857 MB, 881 MB goal, 0 MB stacks, 0 MB globals, 4 P
sounds like gc is working correctly.
Go version
go version go1.22.0 darwin/amd64
Output of
go env
in your module/workspace:What did you do?
GODEBUG=gctrace=1, gcpacer=1 GOGC=3 ./myprogram
What did you see happen?
the heap target steadily increased by 3 % and then suddenly by more than 100% at gc 122
What did you expect to see?
The heap goal to consistenly increase by 3%