Closed myitcv closed 4 years ago
Thanks for reporting. Does govim
use the opt-in go-diff
setting for the diff algorithm?
No we do not
Apologies, I meant to link to the entire gopls
log file for the test. That's now here:
https://gist.github.com/myitcv/f8d3bd52e3a31c30b2befe7909eae285
I'm not sure this can be diff related though can it? Because it doesn't fail every time, only 1 in 5 times (roughly). Hence my assumption this is a race-like error?
Do you have any sense of which CL caused this to become an issue? It does sound like a race, but the behavior here is very strange. My best guess is that there is some sort of race where it tries to do goimports
on an empty file, but I'm not convinced that goimports
would even work in that case.
Also, why is the didChange
split into 2 parts here?
[Trace - 20:58:48.415 PM] Sending notification 'textDocument/didChange'.
Params: {"textDocument":{"version":1,"uri":"file:///tmp/go-test-script389776774/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":1,"character":0},"end":{"line":1,"character":0}},"text":"package main\n\nfunc main() {\n}\n"}]}
[Trace - 20:58:48.416 PM] Sending notification 'textDocument/didChange'.
Params: {"textDocument":{"version":2,"uri":"file:///tmp/go-test-script389776774/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"text":""}]}
Also, why is the didChange split into 2 parts here?
Ooh, I had totally missed this, good spot. The second of those changes sent by govim
accounts for the first line being removed.
{"textDocument":{"version":2,"uri":"file:///tmp/go-test-script389776774/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"text":""}]}
This is actually a programming error in the govim
test itself. Now fixed.
But here's the interesting thing: pre this fix both didChange
notifications have always been sent to gopls
. i.e. this govim
test error has been there for some time, even when this test passes in govim
. They are always sent in exactly this order as well.
So this I think reveals quite a nasty race bug somewhere in gopls
. Because the codeAction
that happens subsequently is, for some reason, not acting on the code that results from did(Open|Change)
notifications that happened before.
Does passing for this test mean that the code remained unchanged? Or that the code action responds with the expected edits? There may be a weird interaction here with the goimports
library actually. If it detects a missing package
declaration, it will try to fill one in and run goimports
with it.
The situation is somewhat complicated by the fact there is an error in the govim
test, and a further bug in govim
. But it does still seem to indicate a problem in gopls
. Let me try and describe the scenario:
main.go
and pastes in the following contents:package main
func main() {
}
govim
, the two actions that represent the "paste" are relayed to gopls
as:package main
func main() {
}
and
<delete line 1>
Hence gopls
believes the file contents are:
func main() {
}
[]protocol.CodeAction{
{
Title: "Organize Imports",
Kind: "source.organizeImports",
Diagnostics: nil,
IsPreferred: false,
Edit: &protocol.WorkspaceEdit{
Changes: &map[string][]protocol.TextEdit{
"file:///tmp/go-test-script863275998/script-format_on_save_new_file/main.go": {
{
Range: protocol.Range{
Start: protocol.Position{},
End: protocol.Position{Line:1, Character:0},
},
NewText: "",
},
},
},
DocumentChanges: nil,
},
Command: (*protocol.Command)(nil),
},
}
govim
to apply the change, meaning it changes the Vim buffer to:
func main() {
}
But this then breaks the test from govim
's perspective because it's expecting the content to be:
package main
func main() {
}
which is how we were alerted to this
no parsed files for package mod.com
So I think the question is, why does gopls
behave randomly? Because in both cases (test passing or failing) it always sees the same changes from govim
.
Ok, some further analysis, which invalidates much of what is above. TL;DR version
govim
The scenario we have is actually as follows:
go.mod
but no other filesmain.go
and reads the following contents:package main
func main() {
}
This leaves the Vim buffer with the following, note the first blank line:
package main
func main() {
}
The test then adjusts for this to remove the first blank line, meaning the buffer is in the following state:
package main
func main() {
}
govim
sends these two changes (because they are processed as two changes in Vim) to gopls
:{"textDocument":{"version":1,"uri":"file:///tmp/go-test-script121089769/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":1,"character":0},"end":{"line":1,"character":0}},"text":"package main\n\nfunc main() {\n}\n"}]}
{"textDocument":{"version":2,"uri":"file:///tmp/go-test-script121089769/script-format_on_save_new_file/main.go"},"contentChanges":[{"range":{"start":{"line":0,"character":0},"end":{"line":1,"character":0}},"text":""}]
Note the first change references LSP line 1.
So the net effect of these two changes leaves gopls
in the same state as Vim:
package main
func main() {
}
This is confirmed with some rudimentary logging in gopls
[]protocol.CodeAction{
{
Title: "Organize Imports",
Kind: "source.organizeImports",
Diagnostics: nil,
IsPreferred: false,
Edit: &protocol.WorkspaceEdit{
Changes: &map[string][]protocol.TextEdit{
"file:///tmp/go-test-script863275998/script-format_on_save_new_file/main.go": {
{
Range: protocol.Range{
Start: protocol.Position{},
End: protocol.Position{Line:1, Character:0},
},
NewText: "",
},
},
},
DocumentChanges: nil,
},
Command: (*protocol.Command)(nil),
},
}
govim
to apply the change, meaning it changes the Vim buffer to:
func main() {
}
This change is then relayed to gopls
.
But this then breaks the test from govim
's perspective because it's expecting the content to be:
package main
func main() {
}
which is how we were alerted to this
no parsed files for package mod.com
So I think the question are:
gopls
behave randomlygopls
return an error when in the ~4 in 5 cases when it doesn't return any changes?Because in both cases (test passing or failing) it always sees the same changes from govim
and those changes leave it with the same contents each time.
Thanks for the logs. Took a look at them, but nothing stands out yet. I put up a very basic CL with some additional logging as a starting point (see https://golang.org/cl/204457). It would be helpful to know what the imports code action thinks it is doing in this case.
I think that this bug might be a good application for @pjweinb's work on replaying gopls
logs.
To summarize some offline discussion - there seem to be 2 errors here. The first is the "no parsed files" error for the "passing" case. The second is the modification of the file in the racey case. I am able to reproduce the first scenario with the log replayer, so I will focus on debugging it first.
It also seems like the cause of the race is gopls
operating on stale file contents. @myitcv has bisected this race as stemming from https://golang.org/cl/201221.
Change https://golang.org/cl/204557 mentions this issue: internal/lsp: do not format the file on import organization
Change https://golang.org/cl/204457 mentions this issue: internal/lsp: debugging CL for golang/go#35114
We are seeing another instance of the error:
no parsed files for package mod.com
when calling codeAction
to get suggested fixes.
Separately, assuming that the error is indeed an error, this returns us to the question of whether codeAction
and Formatting
should return errors in the case of parse failures. Because part of the issue with this having gone unnoticed for so long is that, per https://github.com/golang/go/issues/35114#issuecomment-547743510, the error returned by codeAction
is "swallowed" by govim
.
Why do we swallow these errors in govim
? If we didn't swallow errors, the the user would be unable to save a file that has parse errors (because most users have format-on-save turned on).
Unless there is some way that we determine that a file has parse errors and hence avoid the formatting/codeAction call on save?
Further update, with x/tools
and gopls
97ad0ed33101 I get the following panic with the same test:
panic: runtime error: index out of range [0] with length 0
goroutine 41 [running]:
golang.org/x/tools/internal/lsp/cache.(*session).bestView(0xc00025e580, 0xc0004da0a0, 0x4a, 0x0, 0x0)
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/cache/session.go:218 +0x19b
golang.org/x/tools/internal/lsp/cache.(*session).DidOpen(0xc00025e580, 0xd55d20, 0xc0002aa510, 0xc0004da0a0, 0x4a, 0x0, 0xc00037e390, 0x1, 0x8, 0xadef20, ...)
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/cache/session.go:255 +0x2b8
golang.org/x/tools/internal/lsp.(*Server).didOpen(0xc0002922a0, 0xd55d20, 0xc0002aa510, 0xc000368340, 0x0, 0xc0004dc000)
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/text_synchronization.go:29 +0x15e
golang.org/x/tools/internal/lsp.(*Server).DidOpen(0xc0002922a0, 0xd55d20, 0xc0002aa510, 0xc000368340, 0xc000368340, 0x0)
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/server.go:135 +0x49
golang.org/x/tools/internal/lsp/protocol.serverHandler.Deliver(0xd6ae80, 0xc0002922a0, 0xd55d20, 0xc0002aa510, 0xc0002909c0, 0xc0002aa500, 0xc000260280)
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/lsp/protocol/tsserver.go:111 +0x1e32
golang.org/x/tools/internal/jsonrpc2.(*Conn).Run.func1(0xc0002663c0, 0xc0002909c0, 0xc000292300, 0xd55d20, 0xc0002aa510, 0x0, 0x0, 0xc00017f120)
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/jsonrpc2/jsonrpc2.go:370 +0x170
created by golang.org/x/tools/internal/jsonrpc2.(*Conn).Run
/home/myitcv/gostuff/pkg/mod/golang.org/x/tools@v0.0.0-20191109212701-97ad0ed33101/internal/jsonrpc2/jsonrpc2.go:354 +0x874
That panic appears to have been introduced in https://go-review.googlesource.com/c/tools/+/204822
cc @matloob
Change https://golang.org/cl/206160 mentions this issue: internal/lsp: debugging for golang/go#35114
I've managed to make a bit of progress on this by comparing the gopls
logs of a "pass" and "fail" in the govim
test. Thanks to @ianthehat for providing more context on slack:
cache ≈ disk (the files on the file system) session ≈ disk + memory (the cache plus the overlays) view ≈ disk + memory + config (the config controls how the files are understood, this is the first layer at which you can determine packages) snapshot ≈ view at a specific point in time (every change to either the disk, memory or config causes a new snapshot)
so for consistent contents you need a snapshot not a view (no token needed, it’s inherent in the snapshot object itself)
originally the snapshot layer was integrated into the view, we only recently split it out, so there are lots of things that take a view that should probably take a snapshot instead, diagnostics is probably one of those things
Just as a refresher (more details in the comments above):
package main
func main() {
}
Then the first line is removed (state C):
package main
func main() {
}
Then we run codeAction, then (assuming no error from the previous step) we run Formatting.
no parsed files for package mod.com
This does not appear to be correct, because if we create a file in state C then open Vim we do not see this error in the gopls
logs.
In https://go-review.googlesource.com/c/tools/+/206160 I've pushed up some backtrace logging to help with debugging.
Here are two gopls
log files from a "passing" and failing run:
In the "passing" test the sequence of backtrace logs are, at a high level, as follows:
golang.org/x/tools/internal/lsp/cache.(*view).invalidateContent
via didOpen
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).diagnostics
golang.org/x/tools/internal/lsp/cache.(*view).invalidateContent
via didChange
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).diagnostics
golang.org/x/tools/internal/lsp/cache.(*view).invalidateContent
via didChange
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).diagnostics
Notice the interleaving of the diagnostic runs between the content updates. This results in codeAction returning no parsed files for package mod.com
and hence does not call Formatting.
The failing sequence of backtrace logs are, at a high level, as follows:
golang.org/x/tools/internal/lsp/cache.(*view).invalidateContent
via didOpen
golang.org/x/tools/internal/lsp/cache.(*view).invalidateContent
via didChange
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).diagnostics
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).diagnostics
golang.org/x/tools/internal/lsp/cache.(*view).invalidateContent
via didChange
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).diagnostics
golang.org/x/tools/internal/lsp/cache.(*snapshot).load
via golang.org/x/tools/internal/lsp.(*Server).codeAction
Notice the additional invalidation of content as part of the codeAction call.
no parsed files for package mod.com
) 9 times in 10 seems to suggest that codeAction is not using the current content for the file (at the time of calling codeAction)packages.Load
call appears to be built from the session and not a snapshot (https://github.com/golang/tools/blob/97ad0ed33101b39103eaf8aeab62b59dd1cfb89e/internal/lsp/cache/view.go#L126). For any code calling this async from the request thread (diagnostics at least falls into this category) I think this will be racey with respect to the expected file contents in the overlay.The panic you're seeing looks like #35468. I'll send a fix for that soon.
With the recent versioned text document changes (https://go-review.googlesource.com/c/tools/+/206882 and https://go-review.googlesource.com/c/tools/+/205863) we're still seeing this issue and now more explicitly so in the same test:
--- FAIL: TestScripts/scenario_default/format_on_save_new_file (1.09s)
testscript.go:316:
# Test completions for a file that does not yet exist on disk (0.342s)
> vim ex 'e main.go'
> vim ex 'r main.txt | 0d_'
> vim ex 'w main.go'
[stderr]
unexpected command error: failed to ex(w main.go) in Vim: Caught 'got error whilst handling autocommand:2: got edits for version 2, but current buffer version is 3' in function <SNR>22_callbackAutoCommand[18]..<SNR>22_ch_evalexpr, line 4
[exit status 255]
FAIL: testdata/scenario_default/format_on_save_new_file.txt:5: unexpected command failure
@myitcv: I tried to reproduce this and failed. I checked out govim, changed the replace statements to point to my local copy of x/tools, commented out the skip in format_on_save_new_file.txt, and ran VIM_FLAVOR=vim go test ./cmd/govim -v -run TestScripts/scenario_default/format_on_save_new_file -count 100
successfully. Am I missing a step, or is it that environment-specific?
I spent some time staring at the logs in https://github.com/golang/go/issues/35114#issuecomment-552230773 and I'm quite confused. In the passing log, the code action never replied, so how can we know the test passed? And in the failing log, there is a code action reply, but it's empty, so why did it fail?
I spent some time staring at the logs in #35114 (comment) and I'm quite confused. In the passing log, the code action never replied, so how can we know the test passed?
Because as explained in https://github.com/golang/go/issues/35694 we are forced to ignore errors returned by either CodeAction
or Formatting
. What we instead do is log the error and return early by doing nothing, i.e. we do not make the subsequent Formatting
call.
And in the failing log, there is a code action reply, but it's empty, so why did it fail?
Quoting the comment you linked to:
the test fails when, ~1 time in 10, the code action does not return an error but then Formatting returns edits that appear to have been calculated with respect to the previous version of the buffer. The fact that on some occassions codeAction does not return an error further suggests that it's an error when it does.
The test again runs to completion, but in the case of CodeAction
not replying with an error (or an edits) the subsequent Formatting
call is made, and this is where we get the bad edits.
Ah. Sorry, I missed the formatting reply, was too fixated on codeAction because of the issue title. And now I see the Error for request 2 in the passing log.
Okay, I'm starting to put the pieces together. Here's my understanding, sorry to keep making you repeat yourself.
Overlays: An overlay is, by definition, a diff between gopls' state and the state on disk. We don't currently implement willSaveWaitUntil
, so there's no synchronization between those two states. Without that there will always be some raciness. Note that packages.Load
is used only to get metadata like package name and imports, so this cannot possibly be the cause of bad formatting responses. Also, we we should be cancelling packages.Load
calls for old snapshots, and triggering reloads if the results are stale, but we know there was at least one bug here, #35951, and I could easily believe there are more.
(Incidentally: does govim not send didSave
s? I see that this test does a write, but there's no didSave
even in the passing logs.)
The codeAction
failure: Very reminiscent of #35951. I tried running the tests back before the fix and the failure still didn't reproduce, but maybe it's a timing thing on my machine. The fact that the "passing" run has a Load result showing a file for mod.com
and then immediately complains that there aren't any files in it is strongly indicative of metadata corruption of some kind. Without a repro I can run I will need a lot of help to investigate, I'm afraid. I can start posting CLs with extra logging if you're up for it.
Then we come to the bad formatting, which I think has to be a separate issue. On the one hand, the fact that you're getting edits for the wrong snapshot version is somewhat reassuring, since it means things aren't going totally haywire. On the other hand, I don't know why it's working off the old snapshot. I'll stare for a while.
The other thing is that codeAction
no longer does formatting, and formatting doesn't have file versions in the response, so the old snapshot error won't happen any more either.
I think I really need a new set of log files to make any progress.
The other thing is that codeAction no longer does formatting, and formatting doesn't have file versions in the response, so the old snapshot error won't happen any more either.
Indeed, which is why we make a separate Formatting
call after CodeAction
(assuming the latter returns no error).
I think I really need a new set of log files to make any progress.
I'll work on this.
One of the other critical issues preventing us from moving forward to a later x/tools
and gopls
version is https://github.com/golang/go/issues/35638 in case you're able to take a look.
FWIW we're still seeing this in 95cb2a1a7eae2d74d8211d4930f76e8ccd5e124f (with 80313e1ba7181ff88576941b25b4beb004e348db and 61fa4dffed09fb8cad42837dbc5fc1275d09624a cherry-picked on top):
https://travis-ci.org/govim/govim/jobs/621836927?utm_medium=notification&utm_source=github_status
This failure is part of the PR I've pushed up as a test:
https://travis-ci.org/govim/govim/builds/621836926?utm_source=github_status&utm_medium=notification
At the time of writing, 4 failures during the race test section.
This is hard to test against the latest x/tools
and gopls
because of other test failures we're seeing (see previous comment), but I'll give it a try.
@stamblerre @heschik
I think I really need a new set of log files to make any progress.
Agreed. This issue is now too old to be useful.
As I mention above, we've been stuck between a bit of a rock and a hard place because upgrading to the latest master
of x/tools
/gopls
(which would be necessary to produce new logs, assuming the problem remains) has not been possible because of a number of outstanding issues. Those other issues cause a number of tests to fail/exit early, meaning the conditions under which race conditions are most likely (high load etc) aren't realised.
But in https://github.com/govim/govim/pull/584 we've narrowed that x/tools
/gopls
upgrade down to two issues:
@stamblerre is working on both so I think we're close.
So my plan is to close this issue and re-open/create new issues as necessary once we're "green" on https://github.com/govim/govim/pull/584.
Thanks for you help looking into this in any case.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
In
govim
we have a test that effectively "pastes" the following contents into a new, unsaved buffer calledmain.go
(hence does not exist on disk) and then saves the file:In the test however we sometimes see the first line, the package clause, being removed by the codeAction that fires on save. Here is the relevant line from the
gopls
log:What did you expect to see?
The file contents preserved by the codeAction.
What did you see instead?
The package clause being removed 1 time in 5 this test runs.
cc @stamblerre