x/tools/gopls: completions taking 20x as long since https://github.com/golang/tools/commit/41e4e5654988

myitcv commented 1 year ago

What version of Go are you using (`go version`)?

$ go version
go version go1.20.6 linux/arm64
$ go list -m golang.org/x/tools
golang.org/x/tools v0.13.1-0.20230915141654-e2393aba883a
$ go list -m golang.org/x/tools/gopls
golang.org/x/tools/gopls v0.0.0-20230915141654-e2393aba883a

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN="/home/myitcv/dev/cuelang/bin"
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/myitcv/gostuff/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.20.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/myitcv/dev/cuelang/cue/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2724720507=/tmp/go-build -gno-record-gcc-switches"

What did you do?

git clone https://review.gerrithub.io/cue-lang/cue
git checkout 42ae8e4594a203000488dbaa1d143574fb8588c6
vi cue/types.go:146:10

The line starts as:

func (o *structValue) marshalJSON() (b []byte, err errors.Error) {

Change that to:

func (o *) marshalJSON() (b []byte, err errors.Error) {

Then:

func (o *struc_) marshalJSON() (b []byte, err errors.Error) {

(with the cursor in the _ position, then attempt completion.

What did you expect to see?

Fast results.

What did you see instead?

Slow results.

The following table shows some pretty accurate wall time numbers for the completion call returning. For reference, my machine is an M1 Max 64GB memory, running Linux via VMWare Fusion.

`golang.org/x/tools` Commit	Cold start, no daemon	Same completion, repeated after cold start	Fresh session with daemon running from previous session
https://github.com/golang/tools/commit/ac2946029	102ms	102ms	102ms
https://github.com/golang/tools/commit/41e4e5654988	2032ms	243ms	2069ms
https://github.com/golang/tools/commit/e2393aba883a (current tip)	2172ms	241ms	2213ms

Log files for each result from the first column of timings:

I have bisected this slowdown to https://github.com/golang/tools/commit/41e4e5654988.

This appears to highlight a couple of things to me:

Type checking has slowed down at least in this use case (assuming that is the main factor influencing completion speed).
Something possibly isn't working as expected with a pre-existing daemon instance, which is more easily seen given the previous point.

cc @findleyr

findleyr commented 1 year ago

Thanks for the report. Possibly related: https://github.com/golang/vscode-go/issues/2889.

The CL you bisected to wouldn't have slowed down type checking. However, it would have resulted in many more completion candidates being considered. Before that CL it was possible that gopls miss some struct fields in the current type being selected (!). After that CL we should have no such omissions, but may also pull in additional candidates. The 20x slowdown you observe is likely due to parsing cost for building this candidate list; that cost is mitigated the second time around due to the parse cache.

This was on my radar due to https://github.com/golang/vscode-go/issues/2889, and is among our highest priorities to address.

Based on my current understanding, you should only experience this in certain packages, and should only experience the 20x slowdown at startup, after which subsequent completions should be faster (though perhaps not as fast as before). Can you confirm if that is the case?

CC @adonovan

myitcv commented 1 year ago

Thanks.

Do the numbers in the table above not answer your question re subsequent completions?

Note also the slow speeds even with a daemon running which suggests another bug?

findleyr commented 1 year ago

Do the numbers in the table above not answer your question re subsequent completions?

Sorry, I mean subsequent completions for other things, not just that one specific source location; during editing do you generally experience a 20x slowdown in completion, or 2.5x slowdown? The table suggests 2.5x, but I wanted to be sure.

Note also the slow speeds even with a daemon running which suggests another bug?

Not a bug, but an oversight: the parse-cache is per-session, not per instance -- the parse cache is meant to be a minor optimization, not the massive optimization that is experienced here. This is actually strong evidence that the parse cache is involved.

Interestingly, the bug does not immediately reproduce for me in neovim or vs code (I see completions around 100ms), though it does reproduce for Alan. We're investigating why.

myitcv commented 1 year ago

Sorry, I mean subsequent completions for other things, not just that one specific source location; during editing do you generally experience a 20x slowdown in completion, or 2.5x slowdown? The table suggests 2.5x, but I wanted to be sure.

Truth be told I had to revert to the fast version because I was getting too frustrated :)

So unfortunately I don't have any evidence or examples beyond the analysis presented here.

findleyr commented 1 year ago

So, we have new evidence, but we lost our reproducer and could use your help.

On Alan's machine, we started gopls with -debug=localhost:8099 and visited http://localhost:8099/trace. There, we observed (by clicking on textDocument/completion) that the longest completion was ~2.6s. Scrolling down to look at the trace, we saw that ~2.5s of that was in "runProcessEnvFunc".

This is the unimportedCompletion logic scraping the module cache. Unfortunately, we deleted our backup of the huge mod cache, and can no longer reproduce :facepalm:.

Could you do the following, using gopls@tip (or at the culprit)?

start gopls with -debug=localhost:8099
reproduce, and look at http://localhost:8099/trace. Please share the "Longest" trace for completion, which you can see by clicking on textDocument/completion and scrolling down
try setting "completionBudget": "1ms" to see if it makes a difference (EDIT: not 0s, as 0 => infinity)
try setting "completeUnimported": false to see if it makes a difference

Sorry to enlist you, but absent a reproducer this is hard to track down. The steps above will help confirm our theory.

myitcv commented 1 year ago

So, we have new evidence, but we lost our reproducer and could use your help.

Great, happy to help.

All of my response is based on analysis using https://github.com/golang/tools/commit/866a6b0ff32a.

start gopls with -debug=localhost:8099

reproduce, and look at http://localhost:8099/trace. Please share the "Longest" trace for completion, which you can see by clicking on textDocument/completion and scrolling down

This is what I get. Although for some reason I don't have the "runProcessEnvFunc" span details:

    05:40:11.400 end textDocument/completion (+2.073471033s) method="textDocument/completion" direction="in" id="#2"
        05:40:11.400 event (+2.073470075s) label= status.code="OK" 
        05:40:09.327 start queued
        05:40:09.327 end queued (+51.833µs)
        05:40:09.327 start textDocument/completion method="textDocument/completion" direction="out" id="#3"
        05:40:11.400 end textDocument/completion (+2.072909241s) method="textDocument/completion" direction="out" id="#3"
            05:40:11.400 event (+2.073332116s) label= status.code="OK"

try setting "completionBudget": "1ms" to see if it makes a difference (EDIT: not 0s, as 0 => infinity)

Setting to "1ms" did not influence the completion time:

[Trace - 05:43:19.927 AM] Received response 'textDocument/completion - (2)' in 2026ms.

This feels like a separate bug?

try setting "completeUnimported": false to see if it makes a difference

Bingo (note I reverted the completionBudget setting, i.e. no value set in my config):

[Trace - 05:44:32.173 AM] Received response 'textDocument/completion - (2)' in 24ms.

Which triggers an interesting thought for me.

I would absolutely not have expected the edit and completion that I'm performing in this situation to have triggered unimported completions. Because they are so expensive, I would only have expected them to trigger in the following scenario:

blah._

with the cursor at the position _, where blah is calculated to be unresolved in the current scope and hence it is likely that blah. is an attempt to start and now complete a qualified identifier.

Otherwise, a trivial completion attempt in a "hello world" example:

package main

func main() {
    str_
}

takes ~2000ms from cold and ~200ms warm (same session, ignoring the daemon point for now).

Re questions about the VM setup, last time I checked my disk performance through VMWare Fusion was faster than native macOS. Incidentally, you might find that macOS (and windows?) users have been disproportionately reporting more speed problems if unimported completions are dominating as we think they are, especially where those users have large module caches. Might also be worth asking people to report the size of their mod cache for reference, unless you could also lazily compute and report this in log files.

Here is mine:

$ du -sh $(go env GOMODCACHE)
36G     /home/myitcv/gostuff/pkg/mod
$ du -sh $(go env GOMODCACHE)/cache
7.9G    /home/myitcv/gostuff/pkg/mod/cache

(because as I understand it the cache subdirectory does not participate in the unimported scan, because it contains zip files and bare git trees).

In summary from my perspective:

Unimported completions should only be triggered when there is probably a qualified identifier where the package name cannot be resolved. If this goes against the grain of what most people want (for whatever reason) then I think it's reasonable to stick this behind a config option.
Where unimported completions are triggered, then ideally the cache walk is computed ahead of time async and persists with remote gopls instances: https://github.com/golang/go/issues/47615. This lack of caching between sessions in the presence of a daemon is, I guess, also the likely cause of the timings I've seen above?
There appears to be a bug where the completionBudget is not honoured, per above.

findleyr commented 1 year ago

Thanks for your response. The bug is understood now. Because of the change to how completion budget is implemented, a check for context cancellation that would have prevented unimported completion from running was eliminated.

The unfortunate/problematic thing is that this existed for multiple months without being caught. We do have several benchmarks for completion, but they were insensitive to this change because they don't capture the initial cost and because they operate in a fresh module cache. We've seen a few reports of slow completion our survey results and issues, and I had it on my list to revisit before the next release, but it would have been hard to hone in on this without your clear reproducer. Thank you again.

I'll note that this is a perfect example of where telemetry would have highlighted the regression -- we would have seen the regression in the release, and may even have root-caused it quickly by observing that it affected darwin more than linux.

A few comments:

That 1ms completion budget doesn't cause completion to immediately return is half bug, but also half WAI. The bug is what we've discovered here: certain operations that were previously skipped are no longer being skipped. However, even once these are fixed, setting completionBudget to 1ms won't cause completion to immediately return with empty results. AFAIK, the budget never worked that way when set to a target faster than type checking, which is typically 10-100ms, and I think that's OK. It is a soft latency target, not a hard deadline.
This would all be much less of an issue if we could stream completion results, but VS Code does not support this client-side (see also https://github.com/microsoft/vscode/issues/105870#issuecomment-1668683922).
Unimported completions runs when completing identifiers because it completes to package names. This is moderately useful, but not worth making completion 10x slower.
We could of course avoid running unimported completion in receiver context, but see the next bullet point.
Unimported completion should not be this slow. There's no compelling reason for this to be the case; unimported packages changes infrequently if at all, and since we only care about exported names we can be much less precise.

You can expect that this will be fixed, and soon. In addition to simply fixing the cancellation logic, there appears to be a lot of low-hanging fruit that can make completion significantly faster.

The complicated part is not the fix itself, but rather the process that led to it.

myitcv commented 1 year ago

Thank you for such a wonderfully detailed, comprehensive response. Delighted to help in whatever small way I can, even if that is just trying different scenarios etc.

gopherbot commented 1 year ago

Change https://go.dev/cl/530015 mentions this issue: gopls/internal/regtest/bench: add additional completion benchmarks

gopherbot commented 1 year ago

Change https://go.dev/cl/530016 mentions this issue: gopls/internal/lsp/source/completion: start timing before type checking

gopherbot commented 1 year ago

Change https://go.dev/cl/530215 mentions this issue: gopls: instrument telemetry for latency of important operations

gopherbot commented 1 year ago

Change https://go.dev/cl/530017 mentions this issue: gopls/internal/lsp/source/completion: fixes for completion budget

findleyr commented 1 year ago

@myitcv I believe gopls@master should now have similar latency to what you observed in April. Please let me know if it doesn't.

However, I'm not closing this issue because there appeared to be a lot of low-hanging fruit to further optimize significantly. For example, I analyzed one of our completion benchmarks on Kubernetes, and observed that type checking was only 65ms, while completion took 170ms. I think we can get completion latency much closer to type checking latency.

Coincidentally, the potential optimizations I see are almost identical to what we did for workspace/symbol requests:

Add concurrency: completion is single-threaded at present.
Avoid pre-building strings, by performing fuzzy match before assembling the fully qualified symbol path.
Use a faster fuzzy matcher (and fix ranking edge cases)
Speed up type inference, somehow. (Not sure if this is possible, but often there are ways to pre-filter candidates)

myitcv commented 1 year ago

I believe gopls@master should now have similar latency to what you observed in April. Please let me know if it doesn't.

Thanks, just took a look.

Still seeing similar numbers to before (https://github.com/golang/tools/commit/e2393aba883a) actually. ~20x slow down for the initial completion, and ~2.5x slow down for "warm" completions:

`golang.org/x/tools` Commit	Cold start, no daemon	Same completion, repeated after cold start	Fresh session with daemon running from previous session
https://github.com/golang/tools/commit/ac2946029	102ms	102ms	102ms
https://github.com/golang/tools/commit/fb7463ac1de5	2032ms	243ms	2069ms

Do I need specific settings to make this "work"?

(I tried setting completionBudget = 100ms but that did not affect things)

One of the big pain points here is that ~2s at startup, with or without a daemon already running. Vim is just one of those tools that you open and close the whole time, so the daemon-based setup is critical.

That said, the ~230ms completion is very noticeable vs ~100ms too.

Coincidentally, the potential optimizations I see are almost identical to what we did for workspace/symbol requests:

Did my suggestions in https://github.com/golang/go/issues/62665#issuecomment-1722142899 also make sense with regards to optimisations? Because whilst a different kind of optimisation, it feels like we want to avoid unimported completions as much as possible, and when "forced" to include them ensure that we have warm caches in the daemon (as you previously observed).

gopherbot commented 1 year ago

Change https://go.dev/cl/530599 mentions this issue: gopls: set a context deadline after minimal completion results

findleyr commented 1 year ago

@myitcv can you try again at tip? Sorry to make you keep trying, but I can't reproduce and want to make sure we've successfully rolled back the regression before including additional optimizations.

Did my suggestions in https://github.com/golang/go/issues/62665#issuecomment-1722142899 also make sense with regards to optimisations? Because whilst a different kind of optimisation, it feels like we want to avoid unimported completions as much as possible, and when "forced" to include them ensure that we have warm caches in the daemon (as you previously observed).

You mean the suggestion not to run unimported completions in various contexts where it doesn't make sense? That's definitely a good idea, but I've concluded that it's not at the heart of the problem you observe (see below).

Upon staring at the code and experimenting, I've drawn the following conclusions:

"normal" completion (meaning when "deepCompletion": false is set) is generally pretty fast following type checking.
There are a lot of places where deep completion could be optimized, as cited in https://github.com/golang/go/issues/62665#issuecomment-1734437430, however, given that deep completion only adds marginally improved results, the user-facing benefit of these optimizations would not be as large as I'd originally hoped.
The logic of "normal" completions and "deep" completions is a little entangled, so it's hard to change one without changing the other. It would be nice to refactor and make this delineation clearer.

Therefore, I think we should close this bug once we've verified a fix for the regression, and open new bugs for various completion logic improvements, which would constitute a larger project.

myitcv commented 1 year ago

Good news! Tested against https://github.com/golang/tools/commit/4b34fbf5f10d and we appear to be back in business!

`golang.org/x/tools` Commit	Cold start, no daemon	Same completion, repeated after cold start	Fresh session with daemon running from previous session
https://github.com/golang/tools/commit/ac2946029	102ms	102ms	102ms
https://github.com/golang/tools/commit/fb7463ac1de5	2032ms	243ms	2069ms
https://github.com/golang/tools/commit/4b34fbf5f10d	102ms	102ms	102ms

Thanks very much for digging into this, @findleyr.

I'll leave you to close the issue or keep open as you see fit.

findleyr commented 1 year ago

Fantastic, thanks for confirming. I've opened #63263 to follow up on deep completion performance.

mvdan commented 1 year ago

I was sat next to @myitcv earlier when he upgraded gopls and gave it a try - you should have seen his grin from ear to ear!

golang / go