golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

x/tools/gopls: race-like removal of first line by codeAction #35114

Closed myitcv closed 4 years ago

myitcv commented 5 years ago

What version of Go are you using (go version)?

$ go version
go version devel +03bb3e9ad1 Wed Oct 16 06:29:51 2019 +0000 linux/amd64
$ go list -m golang.org/x/tools
golang.org/x/tools v0.0.0-20191023143423-ff611c50cd12
$ go list -m golang.org/x/tools/gopls
golang.org/x/tools/gopls v0.1.8-0.20191023143423-ff611c50cd12

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/myitcv/gostuff/src/github.com/myitcv/govim/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build521776351=/tmp/go-build -gno-record-gcc-switches"

What did you do?

In govim we have a test that effectively "pastes" the following contents into a new, unsaved buffer called main.go (hence does not exist on disk) and then saves the file:

package main

func main() {
}

In the test however we sometimes see the first line, the package clause, being removed by the codeAction that fires on save. Here is the relevant line from the gopls log:

[Trace - 17:14:28.146 PM] Received response 'textDocument/codeAction - (2)' in 73ms.
Result: [{"title":"Organize Imports","kind":"source.organizeImports","edit":{"changes":{"file:///tmp/go-test-script040847720/script-format_on_save_new_file/main.go":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"newText":""}]}}}]

What did you expect to see?

The file contents preserved by the codeAction.

What did you see instead?

The package clause being removed 1 time in 5 this test runs.


cc @stamblerre

stamblerre commented 5 years ago

Thanks for reporting. Does govim use the opt-in go-diff setting for the diff algorithm?

myitcv commented 5 years ago

No we do not

Apologies, I meant to link to the entire gopls log file for the test. That's now here:

https://gist.github.com/myitcv/f8d3bd52e3a31c30b2befe7909eae285

I'm not sure this can be diff related though can it? Because it doesn't fail every time, only 1 in 5 times (roughly). Hence my assumption this is a race-like error?

stamblerre commented 5 years ago

Do you have any sense of which CL caused this to become an issue? It does sound like a race, but the behavior here is very strange. My best guess is that there is some sort of race where it tries to do goimports on an empty file, but I'm not convinced that goimports would even work in that case.

Also, why is the didChange split into 2 parts here?

[Trace - 20:58:48.415 PM] Sending notification 'textDocument/didChange'.
Params: {"textDocument":{"version":1,"uri":"file:///tmp/go-test-script389776774/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":1,"character":0},"end":{"line":1,"character":0}},"text":"package main\n\nfunc main() {\n}\n"}]}

[Trace - 20:58:48.416 PM] Sending notification 'textDocument/didChange'.
Params: {"textDocument":{"version":2,"uri":"file:///tmp/go-test-script389776774/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"text":""}]}
myitcv commented 5 years ago

Also, why is the didChange split into 2 parts here?

Ooh, I had totally missed this, good spot. The second of those changes sent by govim accounts for the first line being removed.

{"textDocument":{"version":2,"uri":"file:///tmp/go-test-script389776774/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"text":""}]}

This is actually a programming error in the govim test itself. Now fixed.

But here's the interesting thing: pre this fix both didChange notifications have always been sent to gopls. i.e. this govim test error has been there for some time, even when this test passes in govim. They are always sent in exactly this order as well.

So this I think reveals quite a nasty race bug somewhere in gopls. Because the codeAction that happens subsequently is, for some reason, not acting on the code that results from did(Open|Change) notifications that happened before.

stamblerre commented 5 years ago

Does passing for this test mean that the code remained unchanged? Or that the code action responds with the expected edits? There may be a weird interaction here with the goimports library actually. If it detects a missing package declaration, it will try to fill one in and run goimports with it.

myitcv commented 5 years ago

The situation is somewhat complicated by the fact there is an error in the govim test, and a further bug in govim. But it does still seem to indicate a problem in gopls. Let me try and describe the scenario:

package main

func main() {
}
package main

func main() {
}

and

<delete line 1>

Hence gopls believes the file contents are:


func main() {
}
[]protocol.CodeAction{
    {
        Title:       "Organize Imports",
        Kind:        "source.organizeImports",
        Diagnostics: nil,
        IsPreferred: false,
        Edit:        &protocol.WorkspaceEdit{
            Changes: &map[string][]protocol.TextEdit{
                "file:///tmp/go-test-script863275998/script-format_on_save_new_file/main.go": {
                    {
                        Range: protocol.Range{
                            Start: protocol.Position{},
                            End:   protocol.Position{Line:1, Character:0},
                        },
                        NewText: "",
                    },
                },
            },
            DocumentChanges: nil,
        },
        Command: (*protocol.Command)(nil),
    },
}

func main() {
}

But this then breaks the test from govim's perspective because it's expecting the content to be:

package main

func main() {
}

which is how we were alerted to this

no parsed files for package mod.com

So I think the question is, why does gopls behave randomly? Because in both cases (test passing or failing) it always sees the same changes from govim.

myitcv commented 5 years ago

Ok, some further analysis, which invalidates much of what is above. TL;DR version

The scenario we have is actually as follows:

package main

func main() {
}

This leaves the Vim buffer with the following, note the first blank line:


package main

func main() {
}

The test then adjusts for this to remove the first blank line, meaning the buffer is in the following state:

package main

func main() {
}
{"textDocument":{"version":1,"uri":"file:///tmp/go-test-script121089769/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":1,"character":0},"end":{"line":1,"character":0}},"text":"package main\n\nfunc main() {\n}\n"}]}
{"textDocument":{"version":2,"uri":"file:///tmp/go-test-script121089769/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"text":""}]

Note the first change references LSP line 1.

So the net effect of these two changes leaves gopls in the same state as Vim:

package main

func main() {
}

This is confirmed with some rudimentary logging in gopls

[]protocol.CodeAction{
    {
        Title:       "Organize Imports",
        Kind:        "source.organizeImports",
        Diagnostics: nil,
        IsPreferred: false,
        Edit:        &protocol.WorkspaceEdit{
            Changes: &map[string][]protocol.TextEdit{
                "file:///tmp/go-test-script863275998/script-format_on_save_new_file/main.go": {
                    {
                        Range: protocol.Range{
                            Start: protocol.Position{},
                            End:   protocol.Position{Line:1, Character:0},
                        },
                        NewText: "",
                    },
                },
            },
            DocumentChanges: nil,
        },
        Command: (*protocol.Command)(nil),
    },
}

func main() {
}

This change is then relayed to gopls.

But this then breaks the test from govim's perspective because it's expecting the content to be:

package main

func main() {
}

which is how we were alerted to this

no parsed files for package mod.com

So I think the question are:

Because in both cases (test passing or failing) it always sees the same changes from govim and those changes leave it with the same contents each time.

myitcv commented 5 years ago

Here are two gopls logs from when the govim test passes/fails in case this helps to give any clues:

stamblerre commented 5 years ago

Thanks for the logs. Took a look at them, but nothing stands out yet. I put up a very basic CL with some additional logging as a starting point (see https://golang.org/cl/204457). It would be helpful to know what the imports code action thinks it is doing in this case.

I think that this bug might be a good application for @pjweinb's work on replaying gopls logs.

stamblerre commented 5 years ago

To summarize some offline discussion - there seem to be 2 errors here. The first is the "no parsed files" error for the "passing" case. The second is the modification of the file in the racey case. I am able to reproduce the first scenario with the log replayer, so I will focus on debugging it first.

It also seems like the cause of the race is gopls operating on stale file contents. @myitcv has bisected this race as stemming from https://golang.org/cl/201221.

gopherbot commented 5 years ago

Change https://golang.org/cl/204557 mentions this issue: internal/lsp: do not format the file on import organization

gopherbot commented 5 years ago

Change https://golang.org/cl/204457 mentions this issue: internal/lsp: debugging CL for golang/go#35114

myitcv commented 5 years ago

We are seeing another instance of the error:

no parsed files for package mod.com

when calling codeAction to get suggested fixes.

Separately, assuming that the error is indeed an error, this returns us to the question of whether codeAction and Formatting should return errors in the case of parse failures. Because part of the issue with this having gone unnoticed for so long is that, per https://github.com/golang/go/issues/35114#issuecomment-547743510, the error returned by codeAction is "swallowed" by govim.

Why do we swallow these errors in govim? If we didn't swallow errors, the the user would be unable to save a file that has parse errors (because most users have format-on-save turned on).

Unless there is some way that we determine that a file has parse errors and hence avoid the formatting/codeAction call on save?

myitcv commented 5 years ago

Further update, with x/tools and gopls 97ad0ed33101 I get the following panic with the same test:

panic: runtime error: index out of range [0] with length 0

goroutine 41 [running]:
golang.org/x/tools/internal/lsp/cache.(*session).bestView(0xc00025e580, 0xc0004da0a0, 0x4a, 0x0, 0x0)
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/cache/session.go:218 +0x19b
golang.org/x/tools/internal/lsp/cache.(*session).DidOpen(0xc00025e580, 0xd55d20, 0xc0002aa510, 0xc0004da0a0, 0x4a, 0x0, 0xc00037e390, 0x1, 0x8, 0xadef20, ...)
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/cache/session.go:255 +0x2b8
golang.org/x/tools/internal/lsp.(*Server).didOpen(0xc0002922a0, 0xd55d20, 0xc0002aa510, 0xc000368340, 0x0, 0xc0004dc000)
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/text_synchronization.go:29 +0x15e
golang.org/x/tools/internal/lsp.(*Server).DidOpen(0xc0002922a0, 0xd55d20, 0xc0002aa510, 0xc000368340, 0xc000368340, 0x0)
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/server.go:135 +0x49
golang.org/x/tools/internal/lsp/protocol.serverHandler.Deliver(0xd6ae80, 0xc0002922a0, 0xd55d20, 0xc0002aa510, 0xc0002909c0, 0xc0002aa500, 0xc000260280)
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/protocol/tsserver.go:111 +0x1e32
golang.org/x/tools/internal/jsonrpc2.(*Conn).Run.func1(0xc0002663c0, 0xc0002909c0, 0xc000292300, 0xd55d20, 0xc0002aa510, 0x0, 0x0, 0xc00017f120)
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/jsonrpc2/jsonrpc2.go:370 +0x170
created by golang.org/x/tools/internal/jsonrpc2.(*Conn).Run
   /home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/jsonrpc2/jsonrpc2.go:354 +0x874
myitcv commented 5 years ago

That panic appears to have been introduced in https://go-review.googlesource.com/c/tools/+/204822

myitcv commented 5 years ago

cc @matloob

gopherbot commented 5 years ago

Change https://golang.org/cl/206160 mentions this issue: internal/lsp: debugging for golang/go#35114

myitcv commented 5 years ago

I've managed to make a bit of progress on this by comparing the gopls logs of a "pass" and "fail" in the govim test. Thanks to @ianthehat for providing more context on slack:

cache ≈ disk (the files on the file system) session ≈ disk + memory (the cache plus the overlays) view ≈ disk + memory + config (the config controls how the files are understood, this is the first layer at which you can determine packages) snapshot ≈ view at a specific point in time (every change to either the disk, memory or config causes a new snapshot)

so for consistent contents you need a snapshot not a view (no token needed, it’s inherent in the snapshot object itself)

originally the snapshot layer was integrated into the view, we only recently split it out, so there are lots of things that take a view that should probably take a snapshot instead, diagnostics is probably one of those things

Just as a refresher (more details in the comments above):


package main

func main() {
}

Then the first line is removed (state C):

package main

func main() {
}

Then we run codeAction, then (assuming no error from the previous step) we run Formatting.

no parsed files for package mod.com

This does not appear to be correct, because if we create a file in state C then open Vim we do not see this error in the gopls logs.

Analysis

In https://go-review.googlesource.com/c/tools/+/206160 I've pushed up some backtrace logging to help with debugging.

Here are two gopls log files from a "passing" and failing run:

In the "passing" test the sequence of backtrace logs are, at a high level, as follows:

Notice the interleaving of the diagnostic runs between the content updates. This results in codeAction returning no parsed files for package mod.com and hence does not call Formatting.

The failing sequence of backtrace logs are, at a high level, as follows:

Notice the additional invalidation of content as part of the codeAction call.

Rough conclusions

matloob commented 5 years ago

The panic you're seeing looks like #35468. I'll send a fix for that soon.

myitcv commented 4 years ago

With the recent versioned text document changes (https://go-review.googlesource.com/c/tools/+/206882 and https://go-review.googlesource.com/c/tools/+/205863) we're still seeing this issue and now more explicitly so in the same test:

--- FAIL: TestScripts/scenario_default/format_on_save_new_file (1.09s)
    testscript.go:316: 
        # Test completions for a file that does not yet exist on disk (0.342s)
        > vim ex 'e main.go'
        > vim ex 'r main.txt | 0d_'
        > vim ex 'w main.go'
        [stderr]
        unexpected command error: failed to ex(w main.go) in Vim: Caught 'got error whilst handling autocommand:2: got edits for version 2, but current buffer version is 3' in function <SNR>22_callbackAutoCommand[18]..<SNR>22_ch_evalexpr, line 4
        [exit status 255]
        FAIL: testdata/scenario_default/format_on_save_new_file.txt:5: unexpected command failure
heschi commented 4 years ago

@myitcv: I tried to reproduce this and failed. I checked out govim, changed the replace statements to point to my local copy of x/tools, commented out the skip in format_on_save_new_file.txt, and ran VIM_FLAVOR=vim go test ./cmd/govim -v -run TestScripts/scenario_default/format_on_save_new_file -count 100 successfully. Am I missing a step, or is it that environment-specific?

heschi commented 4 years ago

I spent some time staring at the logs in https://github.com/golang/go/issues/35114#issuecomment-552230773 and I'm quite confused. In the passing log, the code action never replied, so how can we know the test passed? And in the failing log, there is a code action reply, but it's empty, so why did it fail?

myitcv commented 4 years ago

I spent some time staring at the logs in #35114 (comment) and I'm quite confused. In the passing log, the code action never replied, so how can we know the test passed?

Because as explained in https://github.com/golang/go/issues/35694 we are forced to ignore errors returned by either CodeAction or Formatting. What we instead do is log the error and return early by doing nothing, i.e. we do not make the subsequent Formatting call.

And in the failing log, there is a code action reply, but it's empty, so why did it fail?

Quoting the comment you linked to:

the test fails when, ~1 time in 10, the code action does not return an error but then Formatting returns edits that appear to have been calculated with respect to the previous version of the buffer. The fact that on some occassions codeAction does not return an error further suggests that it's an error when it does.

The test again runs to completion, but in the case of CodeAction not replying with an error (or an edits) the subsequent Formatting call is made, and this is where we get the bad edits.

heschi commented 4 years ago

Ah. Sorry, I missed the formatting reply, was too fixated on codeAction because of the issue title. And now I see the Error for request 2 in the passing log.

Okay, I'm starting to put the pieces together. Here's my understanding, sorry to keep making you repeat yourself.

Overlays: An overlay is, by definition, a diff between gopls' state and the state on disk. We don't currently implement willSaveWaitUntil, so there's no synchronization between those two states. Without that there will always be some raciness. Note that packages.Load is used only to get metadata like package name and imports, so this cannot possibly be the cause of bad formatting responses. Also, we we should be cancelling packages.Load calls for old snapshots, and triggering reloads if the results are stale, but we know there was at least one bug here, #35951, and I could easily believe there are more. (Incidentally: does govim not send didSaves? I see that this test does a write, but there's no didSave even in the passing logs.)

The codeAction failure: Very reminiscent of #35951. I tried running the tests back before the fix and the failure still didn't reproduce, but maybe it's a timing thing on my machine. The fact that the "passing" run has a Load result showing a file for mod.com and then immediately complains that there aren't any files in it is strongly indicative of metadata corruption of some kind. Without a repro I can run I will need a lot of help to investigate, I'm afraid. I can start posting CLs with extra logging if you're up for it.

Then we come to the bad formatting, which I think has to be a separate issue. On the one hand, the fact that you're getting edits for the wrong snapshot version is somewhat reassuring, since it means things aren't going totally haywire. On the other hand, I don't know why it's working off the old snapshot. I'll stare for a while.

heschi commented 4 years ago

The other thing is that codeAction no longer does formatting, and formatting doesn't have file versions in the response, so the old snapshot error won't happen any more either.

I think I really need a new set of log files to make any progress.

myitcv commented 4 years ago

The other thing is that codeAction no longer does formatting, and formatting doesn't have file versions in the response, so the old snapshot error won't happen any more either.

Indeed, which is why we make a separate Formatting call after CodeAction (assuming the latter returns no error).

I think I really need a new set of log files to make any progress.

I'll work on this.

One of the other critical issues preventing us from moving forward to a later x/tools and gopls version is https://github.com/golang/go/issues/35638 in case you're able to take a look.

myitcv commented 4 years ago

FWIW we're still seeing this in 95cb2a1a7eae2d74d8211d4930f76e8ccd5e124f (with 80313e1ba7181ff88576941b25b4beb004e348db and 61fa4dffed09fb8cad42837dbc5fc1275d09624a cherry-picked on top):

https://travis-ci.org/govim/govim/jobs/621836927?utm_medium=notification&utm_source=github_status

This failure is part of the PR I've pushed up as a test:

https://travis-ci.org/govim/govim/builds/621836926?utm_source=github_status&utm_medium=notification

At the time of writing, 4 failures during the race test section.

This is hard to test against the latest x/tools and gopls because of other test failures we're seeing (see previous comment), but I'll give it a try.

myitcv commented 4 years ago

@stamblerre @heschik

I think I really need a new set of log files to make any progress.

Agreed. This issue is now too old to be useful.

As I mention above, we've been stuck between a bit of a rock and a hard place because upgrading to the latest master of x/tools/gopls (which would be necessary to produce new logs, assuming the problem remains) has not been possible because of a number of outstanding issues. Those other issues cause a number of tests to fail/exit early, meaning the conditions under which race conditions are most likely (high load etc) aren't realised.

But in https://github.com/govim/govim/pull/584 we've narrowed that x/tools/gopls upgrade down to two issues:

@stamblerre is working on both so I think we're close.

So my plan is to close this issue and re-open/create new issues as necessary once we're "green" on https://github.com/govim/govim/pull/584.

Thanks for you help looking into this in any case.