golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.63k stars 17.61k forks source link

cmd/compile: out of memory compiling cmd/compile/internal/ssa with 1GB RAM #27739

Closed philhofer closed 5 years ago

philhofer commented 6 years ago

Building tip on Ubuntu 18.04 on a Digital Ocean VM with one 1GB of RAM at commit 83dfc3b0

phil@spare0:~/go/src$ uname -a
Linux spare0 4.15.0-30-generic #32-Ubuntu SMP Thu Jul 26 17:42:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I can reproduce this out-of-memory condition 100% of the time (in the prove pass in SSA):

Building Go cmd/dist using /home/phil/go-bootstrap.
Building Go toolchain1 using /home/phil/go-bootstrap.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
# cmd/compile/internal/ssa
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x9f727f, 0x16)
        /home/phil/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc02c000000, 0x4000000, 0xe7d258)
        /home/phil/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0xe5a580, 0x4000000, 0xe5a598, 0x7f0d87065308)
        /home/phil/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0xe5a580, 0x3, 0x0)
        /home/phil/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0xe5a580, 0x3, 0xe7d268, 0x400)
        /home/phil/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0xe5a580, 0x3, 0x50, 0x7f0d87351fff)
        /home/phil/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
        /home/phil/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0xe5a580, 0x3, 0x7f0d87010050, 0x7f0d87065270)
        /home/phil/go/src/runtime/mheap.go:758 +0x8a
runtime.(*mcentral).grow(0xe5ccb8, 0x0)
        /home/phil/go/src/runtime/mcentral.go:232 +0x94
runtime.(*mcentral).cacheSpan(0xe5ccb8, 0x1fe)
        /home/phil/go/src/runtime/mcentral.go:106 +0x2f8
runtime.(*mcache).refill(0x7f0d8b82a000, 0xc000020050)
        /home/phil/go/src/runtime/mcache.go:122 +0x95
runtime.(*mcache).nextFree.func1()
        /home/phil/go/src/runtime/malloc.go:749 +0x32
runtime.systemstack(0x455ad9)
        /home/phil/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
        /home/phil/go/src/runtime/proc.go:1229

1GB of memory has been more than enough to build the toolchain in the past.

Barring any clever ideas about how to debug this, I'll try to bisect and hope that there was only one commit that reliably introduced this regression.

agnivade commented 6 years ago

Likely a duplicate of https://github.com/golang/go/issues/26523. ~Can you try this patch as suggested https://github.com/golang/go/issues/26523#issuecomment-407206248 ?~

Sorry, the CL is already merged. I didn't see that.

philhofer commented 6 years ago

Yes, I'll do that.

philhofer commented 6 years ago

That patch is part of the go1.11 release, and building go1.11 still fails.

philhofer commented 6 years ago

I'm going to try to bisect 74b56022a1f834b3edce5c3eca0570323ac90cd7...e0faedbb5344eb6f8f704005fe88961cdc6cf5f8 and see if that produces interesting results.

cherrymui commented 6 years ago

What version of Go is the bootstrap compiler? If I understand correctly, it is toolchain1 OOM'd, and toolchain1 is built with the bootstrap compiler with the bootstrap runtime.

philhofer commented 6 years ago

Ah, fair point. I was using go1.11 as my bootstrap toolchain. (Interestingly, though, building go1.10.4 with go1.11 as the bootstrap toolchain works fine...)

philhofer commented 6 years ago

My first round of bisecting blames commit e9137299bf74e1bcac358b569f86aef73c7c2ea6, but I'm going to bisect again with 1.10.4 as my bootstrap toolchain and see if that produces different results. (That commit doesn't make much sense as the source of the regression.)

cherrymui commented 6 years ago

Go 1.11 compiler does more work, which of course has more code. So it is a double factor here if using Go 1.11 compiler to compile Go 1.11 compiler: it (may) use more memory (even compiling the same code), and it compiles more code.

Maybe a workaround is to use an older (or newer) bootstrap compiler?

philhofer commented 6 years ago

make.bash on go1.11 using go1.10.4 as a bootstrap still fails. I'm bisecting the same commit range.

philhofer commented 6 years ago

When bootstrapping with 1.10.4, bisect blames cc09212f59ee215cae5345dc1ffcd1ed81664e1b.

# bad: [e0faedbb5344eb6f8f704005fe88961cdc6cf5f8] cmd/go: add missing newlines in printf formats
# good: [74b56022a1f834b3edce5c3eca0570323ac90cd7] doc: note that x509 cert parsing rejects some more certs now
git bisect start 'e0faedbb' '74b56022a'
# good: [62adf6fc2d70d9270b4213218e622c15504966be] cmd/internal/obj: convert unicode C to ASCII C
git bisect good 62adf6fc2d70d9270b4213218e622c15504966be
# bad: [4eb1c84752b8d3171be930abf4281080d639f634] cmd/link: fix name section of WebAssembly binary
git bisect bad 4eb1c84752b8d3171be930abf4281080d639f634
# bad: [31ef3846a792012b0588d92251f3976596c0b1b1] cmd/compile: add rulegen diagnostic
git bisect bad 31ef3846a792012b0588d92251f3976596c0b1b1
# good: [cc0aaff40e02192356ccb65d8acf571d12f74a95] cmd/compile: fix Wasm rule file name
git bisect good cc0aaff40e02192356ccb65d8acf571d12f74a95
# good: [3080b7d0af65858400b13134c1c471e2cb35e647] runtime: unify fetching of locals and arguments maps
git bisect good 3080b7d0af65858400b13134c1c471e2cb35e647
# good: [8a16c71067ca2cfd09281a82ee150a408095f0bc] cmd/vet: -composites only checks imported types
git bisect good 8a16c71067ca2cfd09281a82ee150a408095f0bc
# bad: [7d61ad25f8b10c0a656ef709fb30c08f5974594b] crypto/x509: check EKUs like 1.9.
git bisect bad 7d61ad25f8b10c0a656ef709fb30c08f5974594b
# bad: [f2cde55cd60993e948dada9187d25211ec150a5e] runtime: use Go function signatures for memclr and memmove comments
git bisect bad f2cde55cd60993e948dada9187d25211ec150a5e
# good: [e9137299bf74e1bcac358b569f86aef73c7c2ea6] debug/pe: parse the import directory correctly
git bisect good e9137299bf74e1bcac358b569f86aef73c7c2ea6
# bad: [cc09212f59ee215cae5345dc1ffcd1ed81664e1b] runtime: use libc for nanotime on Darwin
git bisect bad cc09212f59ee215cae5345dc1ffcd1ed81664e1b
# good: [e86c26789dbc11c50c4c49bee55ea015847a97b7] runtime: fix darwin 386/amd64 stack switches
git bisect good e86c26789dbc11c50c4c49bee55ea015847a97b7
# first bad commit: [cc09212f59ee215cae5345dc1ffcd1ed81664e1b] runtime: use libc for nanotime on Darwin

That commit doesn't make much sense as the culprit either, but both bisects point to a regression introduced somewhere in or around May of this year.

davecheney commented 6 years ago

What are the stats of the vm you are using? How many cores? Is any swap configured?

From my informal testing you need ~768mb of ram per core on a 64bit machine to complete ./all.bash.

On 19 Sep 2018, at 06:54, Phil notifications@github.com wrote:

make.bash on go1.11 using go1.10.4 as a bootstrap still fails. I'm bisecting the same commit range.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

philhofer commented 6 years ago

It's the cheapest Digital Ocean VM. 1 vCPU, 1GB memory, no swap:

phil@spare0:~$ cat /proc/swaps
Filename                                Type            Size    Used    Priority

It's less important to me how many resources one needs to build the toolchain, and more important that things are moving in the wrong direction. Shouldn't peak resource consumption by the compiler be determined by either the largest package (in the compiler front-end) or the largest function (in the back-end)? The out-of-memory condition doesn't occur in the linker, where I would expect resource consumption to grow in concert with the repo growing ~10% more code.

philhofer commented 6 years ago

Additional anecdotal evidence of a memory use regression, though only for vmsize, not rss, which would make sense.

mvdan commented 6 years ago

I think we should have a builder that has a relatively small amount of memory. Somewhat related, in https://github.com/golang/go/issues/26867 I reported how go test net OOM'd with a few gigabytes of available memory.

We can assume that Linux is fairly common on machines with less memory (small computers, routers, VMs, etc), and that a 64-bit architecture like amd64 should stress test the memory more than a 32-bit architecture would.

We already have special builders like linux-amd64-noopt, so I propose adding a linux-amd64-small. It could start at a limit of 2GB of memory, but we could lower that to 1GB or even lower once it's in place. I presume that we could also add extra limits to it, such as:

/cc @dmitshur

ghost commented 6 years ago

i suggest add memory swap on/off @mvdan

bradfitz commented 5 years ago

@mvdan, let's not combine two bugs into one. It's hard to label & track that way.

Could you file a separate builder bug about a small config? (but perhaps we could just make an existing builder (cgo, noopt?) be the small one... you could float that in the bug, or I could reply there later)

I'm going to remove the "Builders" label from this bug.

mvdan commented 5 years ago

@bradfitz you're right - see the issue above.

ianlancetaylor commented 5 years ago

I just tried building cmd/compile/internal/ssa with tip and it took just over 1G.

josharian commented 5 years ago

https://github.com/golang/go/issues/20104 is one big fix here

philhofer commented 5 years ago

@josharian I agree that #20104 would reduce memory pressure when compiling cmd/compile/internal/ssa. However, you'll notice the compiler fails in the second bootstrap phase rather than the first, which means compiling the current code with the old compiler succeeds, but compiling the same code with the current compiler fails. In other words, the regression is in the compiler's memory use, rather than the size of the code to be compiled. The regression in the compiler's performance seems higher-priority to me, since it impacts more than just folks working on the Go compiler itself.

4a6f656c commented 5 years ago

We've also started seeing this failure regularly on the openbsd/arm builder since around the 6th of March 2019 - a few examples:

https://build.golang.org/log/6c0247ef60f5ab34aae85aee05be681c7e4383cc https://build.golang.org/log/7e3a4e361538049cb66dbd4a8f0f0b0b7a3b5955 https://build.golang.org/log/6ab6975603d179e1c684074fc23c89533f13f58e https://build.golang.org/log/1df442b95529367697c1a58348201fc4d607bd13

(although this is compiling ssa.test, rather than just ssa)

gopherbot commented 5 years ago

Change https://golang.org/cl/176221 mentions this issue: cmd/compile: re-use regalloc's []valState

josharian commented 5 years ago

pprof alloc_space for compiling package ssa with tip:

      flat  flat%   sum%        cum   cum%
  163.02MB  7.87%  7.87%   163.02MB  7.87%  cmd/compile/internal/gc.nodl
  139.42MB  6.73% 14.59%   214.49MB 10.35%  cmd/compile/internal/ssa.(*regAllocState).init
  128.29MB  6.19% 20.78%   135.81MB  6.55%  cmd/compile/internal/ssa.numberLines
   82.46MB  3.98% 24.76%   124.48MB  6.01%  cmd/compile/internal/ssa.cse
   81.94MB  3.95% 28.71%    82.44MB  3.98%  cmd/compile/internal/ssa.schedule
   79.78MB  3.85% 32.56%    79.78MB  3.85%  cmd/compile/internal/gc.scopePCs
   79.02MB  3.81% 36.37%    79.02MB  3.81%  cmd/compile/internal/ssa.(*Func).newValue
   63.01MB  3.04% 39.41%    67.01MB  3.23%  cmd/compile/internal/ssa.(*regAllocState).computeLive
   53.33MB  2.57% 41.99%    53.33MB  2.57%  cmd/compile/internal/gc.(*state).addNamedValue
   50.04MB  2.41% 44.40%    66.14MB  3.19%  cmd/internal/obj.(*LSym).writeAddr

My reactions to this:

This is milestoned 1.13.

If we want to fix it for 1.13, and the little chipping-away-at-the-edges work above doesn't do it, the only available fix I see is #20104. I'm happy to work on that, but there's a fair amount of code churn associated with that, more than I'd normally feel comfortable with during the freeze, so making that judgment call is above my pay grade.

dr2chase commented 5 years ago

It's possible we can do better in scopePCs, I'd need to think about it, but I think we can avoid the huge intermediate array of PCs-with-same-XPos (including column!)

gopherbot commented 5 years ago

Change https://golang.org/cl/154617 mentions this issue: cmd/compile: index line number tables by source file to improve sparsity

gopherbot commented 5 years ago

Change https://golang.org/cl/176577 mentions this issue: cmd/compile: remove large intermediate slice from gc.scopePCs

josharian commented 5 years ago

@4a6f656c observed in #30981 (closed as dup of this issue):

[It is] worth noting that Go built fine on the same builder around Go 1.12 and the issue has gradually gotten worse (first the odd test time build failure, now regular build time failures) - in other words it has regressed between Go 1.12 and current.

@bradfitz @ianlancetaylor unless David's two CLs (outstanding) and mine (merged) get the builder green again, we'll need to make a call about how to proceed here. See https://github.com/golang/go/issues/27739#issuecomment-491036822.

josharian commented 5 years ago

David's first CL is in, and the opendsb/arm builder is currently pending, as opposed to failing quickly, so that is a good sign. Attempts to measure maxrss locally indicate that 1.12 and tip are now similar for package ssa. And compilecmp shows a nice overall memory usage reduction from 1.12 to tip:

name        old alloc/op      new alloc/op      delta
Template         38.3MB ± 0%       37.1MB ± 0%   -3.21%  (p=0.008 n=5+5)
Unicode          28.6MB ± 0%       28.1MB ± 0%   -1.84%  (p=0.008 n=5+5)
GoTypes           136MB ± 0%        125MB ± 0%   -8.41%  (p=0.008 n=5+5)
Compiler          638MB ± 0%        575MB ± 0%   -9.89%  (p=0.008 n=5+5)
SSA              2.19GB ± 0%       1.97GB ± 0%   -9.81%  (p=0.008 n=5+5)
Flate            25.2MB ± 0%       22.9MB ± 0%   -9.13%  (p=0.008 n=5+5)
GoParser         30.0MB ± 0%       27.6MB ± 0%   -7.92%  (p=0.008 n=5+5)
Reflect          84.8MB ± 0%       80.6MB ± 0%   -4.96%  (p=0.008 n=5+5)
Tar              37.4MB ± 0%       34.9MB ± 0%   -6.61%  (p=0.008 n=5+5)
XML              51.3MB ± 0%       45.9MB ± 0%  -10.67%  (p=0.008 n=5+5)
[Geo mean]       87.3MB            81.0MB        -7.29%

So pending some solid green from openbsd/arm, I think we might be good enough for 1.13 here. And then I'll undertake to fix this once and for all with #20104 for 1.14.

lyda commented 5 years ago

Attempting to compile github.com/hashicorp/packer via ports (sysutils/packer) in an AWS t2.small running FreeBSD 11.2. It seems to fail at the linking step with:

*** Error code 1

Stop.
make: stopped in /usr/ports/sysutils/packer
josharian commented 5 years ago

@lyda please file a new issue, and we'll investigate there. And please include a complete log and setup information. Thanks!

lyda commented 5 years ago

Will when I get a chance.

gopherbot commented 5 years ago

Change https://golang.org/cl/177917 mentions this issue: cmd/compile: optimize postorder

4a6f656c commented 5 years ago

FTR the openbsd/arm builder has been passing again since 6081a9f landed - thanks.

dr2chase commented 5 years ago

Do we consider this closed?

josharian commented 5 years ago

Do we consider this closed?

Yes, I think so.

josharian commented 4 years ago

CL 213703 should help, if it goes in.