golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.72k stars 17.63k forks source link

x/tools/gopls: high memory consumption #45457

Closed mikeauclair closed 1 year ago

mikeauclair commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.16.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Ubuntu 20.04.2 LTS

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/mauclair/.cache/go-build"
GOENV="/home/mauclair/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/mauclair/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/mauclair/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.2"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/home/mauclair/tmp/go-build3185508688=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Using emacs with lsp-mode + gopls, gopls used 10GiB memory (if it's helpful, this is is a monorepo with a high volume of generated gRPC code)

What did you expect to see?

Lower memory consumption

What did you see instead?

10GiB memory consumed

Attached gopls memory profile

gopls.1480857-10GiB-nonames.zip

stamblerre commented 3 years ago

Thank you for reporting this issue and attaching to the zip.

One common issue with high memory usage in gopls is caused by code that contains large byte slice literals. It looks like generated gRPC code does contain those (https://github.com/grpc/grpc-go/blob/master/examples/route_guide/routeguide/route_guide.pb.go#L360, for example) so that might explain this issue. @heschi is currently looking into a way to address this.

mikeauclair commented 3 years ago

Thank you for reporting this issue and attaching to the zip.

One common issue with high memory usage in gopls is caused by code that contains large byte slice literals. It looks like generated gRPC code does contain those (https://github.com/grpc/grpc-go/blob/master/examples/route_guide/routeguide/route_guide.pb.go#L360, for example) so that might explain this issue. @heschi is currently looking into a way to address this.

Thanks for that suggestion! Do you have any sense of what kinds of thresholds for byte slices you need to cross before you start to see problematic memory usage?

heschi commented 3 years ago

There's no critical threshold, it's just that they're very disproportionately expensive, so a byte slice with 1M elements might take hundreds of megabytes of memory in gopls.

I just mailed https://golang.org/cl/308730. It would be interesting to hear if it helps you at all. You can click the "Download" button to get clone instructions, then do go install in the gopls directory to install that version.

mikeauclair commented 3 years ago

Great! My teammate, @nik-andreev, is planning to try this CL out and see what it does for our memory usage

nik-andreev commented 3 years ago

There's no critical threshold, it's just that they're very disproportionately expensive, so a byte slice with 1M elements might take hundreds of megabytes of memory in gopls.

I just mailed https://golang.org/cl/308730. It would be interesting to hear if it helps you at all. You can click the "Download" button to get clone instructions, then do go install in the gopls directory to install that version.

I've just tried it out and still see the high memory consumption. gopls.2096622-8GiB-nonames.zip

nik-andreev commented 3 years ago

I also hit nil pointer panic after few hours of using the 308730 version of gopls. @mikeauclair and I use the same server (and version of go) but I run gopls via VSCode with remote ssh. Here is the stack trace in case it is useful:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xb8825e]

goroutine 40047 [running]:
golang.org/x/tools/internal/lsp/cache.trimTypesInfo.func1(0xf5c850, 0xc120f4f580, 0xf5c500)
    /gopls/tools/internal/lsp/cache/check.go:817 +0x9e
go/ast.inspector.Visit(0xc0d04c9820, 0xf5c850, 0xc120f4f580, 0x0, 0x0)
    /usr/local/go/src/go/ast/walk.go:373 +0x3a
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c850, 0xc120f4f580)
    /usr/local/go/src/go/ast/walk.go:52 +0x63
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5cbe8, 0xc120f49020)
    /usr/local/go/src/go/ast/walk.go:151 +0x31e
go/ast.walkExprList(0xf59c80, 0xc0d04c9820, 0xc120f51620, 0x2, 0x2)
    /usr/local/go/src/go/ast/walk.go:26 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c850, 0xc120f4f5c0)
    /usr/local/go/src/go/ast/walk.go:104 +0x406
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5cf30, 0xc120f51640)
    /usr/local/go/src/go/ast/walk.go:143 +0x14b4
go/ast.walkExprList(0xf59c80, 0xc0d04c9820, 0xc120f51660, 0x2, 0x2)
    /usr/local/go/src/go/ast/walk.go:26 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c6e8, 0xc120f4f600)
    /usr/local/go/src/go/ast/walk.go:137 +0xfc5
go/ast.walkExprList(0xf59c80, 0xc0d04c9820, 0xc0ccc7abb0, 0x1, 0x1)
    /usr/local/go/src/go/ast/walk.go:26 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c530, 0xc120f4f640)
    /usr/local/go/src/go/ast/walk.go:207 +0x1f97
go/ast.walkStmtList(0xf59c80, 0xc0d04c9820, 0xc120f4f680, 0x3, 0x4)
    /usr/local/go/src/go/ast/walk.go:32 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c698, 0xc120f49050)
    /usr/local/go/src/go/ast/walk.go:224 +0x1968
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5ca30, 0xc0ccc7abe0)
    /usr/local/go/src/go/ast/walk.go:98 +0x29f2
go/ast.walkExprList(0xf59c80, 0xc0d04c9820, 0xc11b21bf80, 0x6, 0x8)
    /usr/local/go/src/go/ast/walk.go:26 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c6e8, 0xc120f4f6c0)
    /usr/local/go/src/go/ast/walk.go:137 +0xfc5
go/ast.walkExprList(0xf59c80, 0xc0d04c9820, 0xc0ccc7abf0, 0x1, 0x1)
    /usr/local/go/src/go/ast/walk.go:26 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c530, 0xc120f4f700)
    /usr/local/go/src/go/ast/walk.go:207 +0x1f97
go/ast.walkStmtList(0xf59c80, 0xc0d04c9820, 0xc11971d800, 0x1e, 0x20)
    /usr/local/go/src/go/ast/walk.go:32 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c698, 0xc120f491d0)
    /usr/local/go/src/go/ast/walk.go:224 +0x1968
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5ca08, 0xc120f49200)
    /usr/local/go/src/go/ast/walk.go:344 +0xce6
go/ast.walkDeclList(0xf59c80, 0xc0d04c9820, 0xc120f5e400, 0x2e, 0x40)
    /usr/local/go/src/go/ast/walk.go:38 +0x9e
go/ast.Walk(0xf59c80, 0xc0d04c9820, 0xf5c9b8, 0xc120f56500)
    /usr/local/go/src/go/ast/walk.go:353 +0x2446
go/ast.Inspect(...)
    /usr/local/go/src/go/ast/walk.go:385
golang.org/x/tools/internal/lsp/cache.trimTypesInfo(0xc0cf83fba0, 0x2, 0x2, 0xc1203c2be0)
    /gopls/tools/internal/lsp/cache/check.go:802 +0x50
golang.org/x/tools/internal/lsp/cache.typeCheck(0xf6a398, 0xc1203f5000, 0xc000336d00, 0xc00b2cd260, 0x2, 0xc003964bd0, 0x0, 0x0, 0x0)
    /gopls/tools/internal/lsp/cache/check.go:457 +0x11d2
golang.org/x/tools/internal/lsp/cache.(*snapshot).buildPackageHandle.func1(0xf6a398, 0xc1203f5000, 0xf58d80, 0xc000336d00, 0xc000f532c0, 0x0)
    /gopls/tools/internal/lsp/cache/check.go:121 +0x1bc
golang.org/x/tools/internal/memoize.(*Handle).run.func1(0xc0cf83fb00, 0xf6a398, 0xc1203f5000, 0xc003a8e400, 0xf58d80, 0xc000336d00, 0xc00b546e40)
    /gopls/tools/internal/memoize/memoize.go:322 +0xab
created by golang.org/x/tools/internal/memoize.(*Handle).run
    /gopls/tools/internal/memoize/memoize.go:315 +0x185
mikeauclair commented 3 years ago

Small update here: I was able to slice ~8% off of heap utilization by moving some of our byte slice literals to use //go:embed (we generate a bunch ourselves instead of using protoc-gen-go so it was fairly easy to switch) so there's something to some of this being due to those literals. I'm going to take a look at what our new top offenders are and see if there are any common threads.

heschi commented 3 years ago

Sorry I didn't mention here: the latest patch set of https://golang.org/cl/308730 should no longer panic. You can try that out if you want; I'm going to try to get it merged today.

mikeauclair commented 3 years ago

👍 I'll check it out

gopherbot commented 3 years ago

Change https://golang.org/cl/311549 mentions this issue: internal/lsp/cache: remove type info trimming

gopherbot commented 3 years ago

Change https://golang.org/cl/310170 mentions this issue: internal/lsp: introduce DegradedLowMemoryMode

drocha87 commented 3 years ago

Sorry I didn't mention here: the latest patch set of https://golang.org/cl/308730 should no longer panic. You can try that out if you want; I'm going to try to get it merged today.

I checkout this patch, but the problem with high memory consumption still persists. My computer freezes completely after trying to go to a definition inside aws-sdk-go.

system

5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

go version

 go version go1.16.3 linux/amd64

gopls version

golang.org/x/tools/gopls master
    golang.org/x/tools/gopls@(devel)

I stoped the process because as I told, my computer freezes completely, so the generated log is only 2gb but it increases a lot more than that.

I had this problem using both neovim and vim with vim-go or Coc-go

If you need more information I'll be glad to provide.

gopls.8453-2GiB-nonames.zip

heschi commented 3 years ago

@drocha87 Please file a new issue. If you're working in an open source repository, full repro instructions would be helpful.

drocha87 commented 3 years ago

I'm not sure about open a new issue since this one is still open and I'm reporting the feedback about the patch proposed in the discussion here.

The repo I'm working on is a private repo, but I'm having high memory consumption when I try to go to a definition inside the aws-sdk-go repo, so I think is pretty straight forward to repro this issue.

Right now I don't have time to setup a repro, but I'll be glad to do that in my free time.

TerminalFi commented 3 years ago

Any update on this? I am working in a mono repo (private) and I can't even use gopls. It climbs within 25 minutes of usage and hits 16+ Gigs between go and gopls and never finished loading the workspace. I get context deadline exceed and then it stops.

findleyr commented 1 year ago

gopls@v0.12.0-pre.3 significantly reduces gopls' memory usage, up to an order of magnitude on larger repositories.

findleyr commented 1 year ago

(please open a new issue if you are still struggling with memory usage)

mikeauclair commented 1 year ago

Thanks! We've updated to 0.12.0 and are seeing roughly the order-of-magnitude decrease in memory footprint reported in the release notes

findleyr commented 1 year ago

@mikeauclair thanks for following up! That's great to hear.

folago commented 1 year ago

This is still happening

image

$ gopls version
golang.org/x/tools/gopls v0.12.4
    golang.org/x/tools/gopls@v0.12.4 h1:nce5etAamR46d9oNGxop1aRK5rDQ0NqcY/SHIcyfEKY=
$ go version
go version go1.20.5 linux/amd64
findleyr commented 1 year ago

Hi @folago

Are you working in open source? Are you willing to help us understand your workspace?

To start, could you share the output of gopls stats -anon? That will give us a hint of where to look. Then we may want to grab some memory profiles.

folago commented 1 year ago

Not opensource unfortunately.

I now use v0.11.0 that uses between 7 and 10 GB.

I switched to v0.12.4 and it just got oom killed.

image

$ gopls stats -anon
Initializing workspace...     done (5.788011924s)
Gathering bug reports...      done (984.377211ms)
Querying memstats...          done (62.381751ms)
Querying workspace stats...   done (42.190485ms)
Collecting directory info...  done (77.805032ms)
{
  "DirStats": {
    "Files": 29310,
    "TestdataFiles": 250,
    "GoFiles": 22818,
    "ModFiles": 2,
    "Dirs": 4245
  },
  "GOARCH": "amd64",
  "GOOS": "linux",
  "GoVersion": "go1.20.5",
  "GoplsVersion": "v0.12.4",
  "InitialWorkspaceLoadDuration": "5.788011924s",
  "MemStats": {
    "HeapAlloc": 484451936,
    "HeapInUse": 617086976,
    "TotalAlloc": 869349072
  },
  "WorkspaceStats": {
    "Files": {
      "Total": 22928,
      "Largest": 11691838,
      "Errs": 0
    },
    "Views": [
      {
        "GoCommandVersion": "go1.20.5",
        "AllPackages": {
          "Packages": 3118,
          "LargestPackage": 572,
          "CompiledGoFiles": 23528,
          "Modules": 320
        },
        "WorkspacePackages": {
          "Packages": 483,
          "LargestPackage": 266,
          "CompiledGoFiles": 3436,
          "Modules": 1
        },
        "Diagnostics": 0
      }
    ]
  }
}

Anything else I can provide?

findleyr commented 1 year ago

That output shows gopls using only 617 MB after its initial workspace load, which indicates that most of the memory is spent holding data related to open packages.

I'll also note that your largest workspace package has 572 files, which is quite large. Elsewhere, we have seen similar problems related to large packages.

Can you do an experiment? Start by opening a low-level package that does not import this large package transitively (something like a "util" package that has few imports). Does gopls stabilize at a low memory usage?

Then open a package that has a large import graph, and which imports this large package transitively (something like a "main" package). Does this cause your memory to blow up?

As a next step, we can try looking at a memory profile. I am confident we can fix this problem, if you are willing to help us investigate. I will open a new issue to track this fix.

folago commented 1 year ago

That output shows gopls using only 617 MB after its initial workspace load, which indicates that most of the memory is spent holding data related to open packages.

I'll also note that your largest workspace package has 572 files, which is quite large. Elsewhere, we have seen similar problems related to large packages.

Can you do an experiment? Start by opening a low-level package that does not import this large package transitively (something like a "util" package that has few imports). Does gopls stabilize at a low memory usage?

Yes, this is what I observed. If I stay away from some packages with the accursed large dependency the memory is quite low with v0.12.4:

image

Then open a package that has a large import graph, and which imports this large package transitively (something like a "main" package). Does this cause your memory to blow up?

Yes this is what I have experienced, gopls starts consuming all the memory available and it does not stop until it gets oom killed (lucky) or my machine freezes and I have to force a reboot.

As a next step, we can try looking at a memory profile. I am confident we can fix this problem, if you are willing to help us investigate. I will open a new issue to track this fix.

You mean start add -profile.mem=string as parameter to start gopls?

folago commented 1 year ago

This is the profile I get by opening a package without the huge dep. gopls.zip

I can open the bad one and try to stop it before it gets oom killed or my machine freezes.

folago commented 1 year ago

Cannot manage to get a profile when I open the huge dependency, once gopls starts eating all the memory it does not care if I stop neovim or if I call :LspStop and now it seems to just freeze. I can kill it but then it leaves the profile file empty. Does it handles a signal better than SIGTEM?

findleyr commented 1 year ago

Thanks @folago -- the profile after opening the huge dep is what we need.

Here's how you can get it: start gopls with for example -debug=localhost:8080, and while gopls is churning on the large dep, run: go tool pprof http://localhost:8080/debug/pprof/heap.

That should allow you to grab a snapshot of the gopls heap in action. If you can share the entire result, that would be great. But we would also learn a lot simply from the output of top 50 and top 50 -cum (in the pprof REPL).

folago commented 1 year ago

Ah nice!

I made a small script to curl the heap profile to a file each second since it was difficult to catch it before my machine got unusable.

This is the second last, the last one was empty. heap.90.zip

If the other profile can be useful to compare let me know.

Posting also the script in case there is something wrong with it that I missed:

#! /bin/sh
for i in {1..1000}; do 
    curl http://localhost:6060/debug/pprof/heap > heap.$i.out
    sleep 1
done
folago commented 1 year ago

Also these are my settings for gpls in neovim:

    settings = {
        gopls = {
            analyses = {
                unusedparams = true,
                unusedwrite = true,
                fieldalignment = true,
                nilness = true,
            },
            staticcheck = true,
            -- gofumpt = true,
            templateExtensions = { 'tpl', 'tmpl' },
        },
    },

Just in case this is helpful, since I have few analyses turned on.

findleyr commented 1 year ago

Thanks so much for helping us investigate!

Have not yet investigated your profile, but already I have a theory: both the nilness analyzer and staticcheck use a form of SSA/SSI analysis that can be quite expensive. That could explain the disproportionate cost of analysis in certain packages.

folago commented 1 year ago

I can try to disable those two and send another profile.

folago commented 1 year ago

Oh yeah, so much better: image

It is weird tho that it reads so much less from disk, before it was in the GB range I think once I saw it go to ~12GB

This is the gpls config in neovim:

            analyses = {
                unusedparams = true,
                unusedwrite = true,
                fieldalignment = true,
                -- nilness = true,
            },
            -- staticcheck = true,
            -- gofumpt = true,
            templateExtensions = { 'tpl', 'tmpl' },

And here the second last profile:

heap.nochecks.229.zip

findleyr commented 1 year ago

Nice. I think we have a plausible theory. Now for a fix...

For some background: analyzers that do SSA analysis are essentially broken in v0.11.0, because they do not have generated SSA for non-workspace packages. We "fixed" this in v0.12.0, but clearly that fix is not viable in some cases.

It is weird tho that it reads so much less from disk, before it was in the GB range I think once I saw it go to ~12GB

That's a useful datapoint. Most likely this is serialization of SSA structures. In particular, if they take more than 1GB, you'll be churning through the fixed-size cache at a high rate, which could lead to additional performance problems.

folago commented 1 year ago

I see, thanks for the explanation and for the workaround.

I guess I can live without those checks for now, looking forward to the next release.

And let me know if there is anything else that I can do to speed up the diagnosis.

findleyr commented 1 year ago

@folago you might try enabling just nilness. I would be curious to know if that performs OK. If so, it indicates we're doing something wrong with staticcheck's analysis facts. I would expect nilness and staticcheck to have similar costs, as both are constructing an IR for the source being analyzed.

folago commented 1 year ago

Yep, nilness works!

image

    settings = {
        gopls = {
            analyses = {
                unusedparams = true,
                unusedwrite = true,
                fieldalignment = true,
                nilness = true,
            },
            -- staticcheck = true,
            -- gofumpt = true,
            templateExtensions = { 'tpl', 'tmpl' },
        },
    },

heap.nilness.142.zip

folago commented 1 year ago

While staticcheck was getting out of hand pretty fast. I killed gopls before the freezing I hope the profile is still useful. heap.static.10.zip

findleyr commented 1 year ago

Thanks. As noted in #61178, the profile definitely indicates that the majority of allocation is related to staticcheck.

Now that we have an approximate cause, let's continue discussion on that issue. My guess is that we will be able to reproduce, by enabling staticcheck with some repositories we know to be similarly problematic for analysis.