x/tools/gopls: add support for multiple concurrent clients

leitzler commented 5 years ago

As per discussion on Slack with @ianthehat & @myitcv this is a feature request for gopls to support multiple concurrent clients.

Vim users often have multiple instances of vim running at the same time, and starting/exiting is a natural part of the workflow. Currently there is a 1 to 1 mapping between vim and gopls (at least when using govim as plugin). It would be really useful to be able to share the same cache and avoid waiting for warmup each time.

See https://github.com/myitcv/govim/issues/491 as well as https://github.com/golang/go/issues/32750#issuecomment-528314462 for details.

gopherbot commented 5 years ago

Thank you for filing a gopls issue! Please take a look at the Troubleshooting section of the gopls Wiki page, and make sure that you have provided all of the relevant information here.

leitzler commented 5 years ago

@gopherbot, please add label FeatureRequest

ianthehat commented 5 years ago

In its default state gopls supports a specific header based JSON stream of the LSP on stdin /stdout. In this mode it only supports a single client as stdin/stdout cannot be multiplexed.

It also has the flags -listen and -remote, which are designed to allow it to communicate using sockets. -listen only applies to the serve verb, and causes it to listen for incoming connections rather than stdin/stdout. In this mode it supports multiple clients. The -remote flag can be supplied to any verb, and tells it how to talk to a gopls that was started with serve -listen. If you use it with serve then you have a gopls that listens on stdin/stdout and forwards commands to the remote gopls. The intent is that you use this mode from within an editor to talk to a gopls server.

The protocol spoken between a -remote and -listen gopls is not defined, and never will be, we only support it as a way of intercommunication, not as an API surface. This is because to achieve some of its goals it will have to have significant extensions to the protocol, and may mutate some of the data on the way through. Part of the reason for this is that it should be feasible to have the server run on a separate machine, and it may not have access to the same file system, or it may know the files by different absolute paths etc. These features require a reliable way of translating paths, and also the remote file extension so the true server can ask the intermediary gopls for file contents. It may also be necessary to have some forms of caching and cancelling within the intermediary.

The current state is that we use this mode only for debugging. It only gets fixed when we need to use it to debug a problem, and even then it does not get properly fixed. It does mostly work, but there are things like straight proxying of shutdown message causes the shared server to quit when any of clients does.

There are also design issues still to fix, things like should we support some kind of "discovery" protocol, should we have a mode where we start a server if one is not running but connect to it if it is, when all the clients go away should the server shut down again, how do we manage the cache so we dont explode because of lots of clients over a very long time, how do we prevent one client from starving the others, how do we manage the config and environment of the server correctly etc

myitcv commented 5 years ago

Thanks for this detail.

The current state is that we use this mode only for debugging

Understood.

For editors like Vim, Emacs, etc, where users end up starting multiple instances on the same machine having a single instance of gopls will be a big win when it avoiding waiting for (pre-)warming of the imports cache (https://github.com/golang/go/issues/34115).

Given that, do you have plans to fully support this mode?

ianthehat commented 5 years ago

Yes, but I have no time to do anything about it right now. It is also valuable for people that want to run gopls from the command line if it could reuse the cached results already sitting behind their editor, or from a previous command. Doing this well you really want to have a discovery protocol though, having to specify -remote on every command would be painful. Ideally I would like to delete the -remote flag in the future and only support a discovery protocol!

myitcv commented 5 years ago

Yes, but I have no time to do anything about it right now.

Ok, understood. My question was more geared towards ensuring this is something that is being considered as opposed to not.

ianthehat commented 5 years ago

I am just being really careful to make sure people do not think I will get round to it any time soon, I don't want someone waiting on it and getting frustrated that I am not doing it!

This is also an area where contributions would be welcome, although it would be a very high touch contribution as I have a lot to say about how it is done :)

myitcv commented 5 years ago

Just to add another motivating factor behind this change: having a single instance will amortise the additional cost of running staticcheck on large code bases each time gopls is opened.

findleyr commented 4 years ago

Just to update, since this was discussed in slack: I'm working on this now, and should have some progress soon.

gopherbot commented 4 years ago

Change https://golang.org/cl/214803 mentions this issue: internal/lsp: add server instance to debug info

myitcv commented 4 years ago

@findleyr is this a v1.0.0 target of v0.3.0? Not chasing... just noting that it being worked on now probably contradicts the current "unplanned" milestone 😄

findleyr commented 4 years ago

Oh, thanks. Yes, I think this can be added to the v1.0.0 milestone (the v0.3.0 feature set is pretty locked-down at the moment). I'll confirm with @stamblerre.

gopherbot commented 4 years ago

Change https://golang.org/cl/215739 mentions this issue: internal/lsp/cmd: add a test for client logging

gopherbot commented 4 years ago

Change https://golang.org/cl/215740 mentions this issue: internal/lsp: remove the Context argument from NewSession

findleyr commented 4 years ago

Update: I've got a number of things working locally, but some of them required significant refactoring that needs to be evaluated before merging. I thought it would be helpful to comment here describing what I'm working on, in case anyone has opinions, questions, or advice.

Goals

Here's what I’ve identified as the primary goals for the shared gopls implementation:

Ease of use: from the perspective of an LSP client, using a shared gopls instance should appear identical to using a singleton gopls instance.
Debuggability: each client session should be isolated in logs and metrics.
Testability: gopls tests should cover both in-process and remote instances of gopls.

Ease of use

To connect to a shared gopls instance, we'll use the existing -remote flag as specified below. This will cause the gopls process invoked by the editor plugin to operate as a thin client for a separate gopls server process, forwarding LSP messages and recording metrics and logs.

-remote="<addr>:<port>": connect to a specific shared gopls at that address. If it’s not running, fail.
-remote=auto: connect to the default shared gopls. If it's not running, start it and then connect.
-remote=auto:<port>: connect to a shared gopls instance at localhost:<port>, or start it and then connect.
-remote=auto:<id>: discover a shared gopls instance identified by a non-numeric string <id>, or start it and then connect

This will allow the following usage patterns:

Manual management of a shared gopls instance.
Automatic management of a shared gopls instance.
Automatic management of a set of shared gopls instances identified by specific ports or IDs. This is a minor extension of the previous use cases, and will allow for scenarios where developers are using different process/mount namespaces in their workflow (thanks @myitcv for the suggestion!)

I'm not married the the -remote=auto syntax, but I think the semantics are correct: using a shared instance should be no harder than using a singleton instance.

For future discussion, I'll refer to the thin client gopls process (the one started with the -remote flag) as the forwarder gopls, and the server gopls process (the one started with the -listen flag), as the shared gopls. This is to avoid confusion around the words ‘client’ and ‘server’, which now become relative terms.

The Server Shutdown Problem

One major problem with starting a shared gopls process automatically is server shutdown: the shared gopls will be a child process of whichever forwarder gopls process started it, and will die when that forwarder process exits. For certain workflows this might be a big problem, for example users who use only short-lived vim processes. I can think of four potential solutions for this:

Simply crash all forwarder gopls when the shared gopls exits. This will force the editor plugin to restart the language server, but most editor plugins already do this. This is crude but is the easiest solution to implement.
Daemonize the shared gopls process, and have it shut down when there are no more forwarder gopls connected. There are various libraries that could potentially help with this (example), but it is shaping up to be a tricky solution. Not all editors will have the necessary privileges to daemonize a process.
Track session information in the forwarder gopls (essentially editor buffer state), and when the shared gopls exits, have the remaining forwarder gopls’ start another one and initializes its session state.
The nonsolution: don't allow --remote=auto and just force users to start the daemon themselves.

Of these, I don't think (1) or (4) are reasonable solutions in isolation. We can't expect every user to manage their own gopls daemon, and we can't expect every LSP plugin to gracefully handle the LSP server process crashing. Notably VS Code gives up if the language server crashes five times, so if a shared gopls instance is to be used by VS Code, we shouldn’t be intentionally crashing the forwarder.

(1) and (3) both result in the loss of the gopls cache, so after a restart users would have to again pay the initial price of warming the cache. On large projects this can be painful, and since users won't be aware of which forwarder owned the shared gopls, it will be confusing when this happens. However, I will note that so far while working in x/tools with a shared gopls instance, I hardly notice when it restarts.

(2) would be the ideal solution as it results in the least amount of lost state, but I think it simply won't be possible in many executing environments. I could be wrong though: I need to do more research on daemonization.

My current plan is to start by supporting (1) and (4) so that we can all begin experimenting with using a shared gopls instance, and then work on (2) or (3) (or both, or <a better idea>) toward the end of this project.

Debuggability

I'm lifting the LSP forwarding to the jsonrpc2 layer. What is currently TCP forwarding will instead be two jsonrpc2 connections talking to each other. This is done both so that we can instrument the forwarder gopls the same way we instrument the shared gopls, and so that we can insert a handshake across the jsonrpc2 stream connecting the forwarder to shared gopls, before starting to forward the LSP. In doing so, we allow the forwarder and shared gopls to exchange information that can be used in debugging. For example, the forwarder gopls can know the location of shared logs or the shared gopls debug port.

Doing this will require some refactoring of the jsonrpc2 API.

Testability

I'm going to do a bit of refactoring of internal/lsp/tests to make it easier to run tests against a remote gopls instance. Once this is done, we should be able to run both the lsp tests and cmd tests against a remote (CLI commands will also support -remote=auto).

findleyr commented 4 years ago

@leitzler pointed out on slack: it would be good to also support unix domain sockets as an IPC mechanism between forwarder and shared gopls instance (thanks for the suggestion!). I agree, but I think we will always need to support TCP as well. One use case that has been discussed is running gopls in docker, in which case exposing a TCP listener is simplest.

For now, I'm going to focus on TCP. I can add support for unix domain sockets later, or perhaps it would be a good opportunity for others to contribute.

myitcv commented 4 years ago

@findleyr thanks for awesome write up.

Regarding "loss of the gopls cache"

In addition to the cost of re-warming the diagnostics/analysis cache, another pain point is the re-warming of the unimported cache. This can be particularly costly is you have a large module cache.

Regarding option 2

Daemonize the shared gopls process, and have it shut down when there are no more forwarder gopls connected.

Do we really want to shut it down when there are no more forwarded gopls connected? Because if I open Vim, do some work then quit, the shared gopls will be shutdown. Meaning if I re-open Vim it will need to start from cold again. Which I think defeats the point of what we're trying to solve here, unless I misunderstood? I'd say keep the shared gopls instance running forever and provide a means via a forwarder (a flag, like -stop used in conjunction with -remote) to stop the shared gopls instance.

The option space

Per our Slack chat I completely agree it's worth getting something landed so we can start playing/experimenting. I'm minded to think that option 2 is really the only solution in the medium-long term: if a user is working in an environment where we can't daemonize then I think it's probably fair to start fallback to the current, i.e. non-remote, behaviour. I don't think we'd want to not do something with daemonization simply because we can't support 100% of cases because we will get a huge return for those cases where we can (he says, selfishly 😄)

findleyr commented 4 years ago

@myitcv thanks for the feedback. I think in your case the better solution might be to explicitly manage the daemon (option 4). Even if it were possible for gopls to automatically start a daemon (which won't be the case in most environments), it would be bad to silently leave behind a process that consumes so much memory.

It will always be possible to manage the daemon yourself, but it would be great if this isn't necessary for most users to get the benefit of shared state.

cunderwood-va commented 4 years ago

Could always have the server shut itself down if no clients connect for some (configurable?) amount of time. If it hangs around idle, odds are the os will slowly page it out in effectively a random order, which easily results in the existing server taking longer to respond (and continuing to be laggy even after the first response) than it would take to fire up a fresh server and reparse the source. Random seeks on-demand are far slower than the sort of streaming (and potentially cached) reads of the source files, despite the solid state drive that most devs will be using; alas, os's mostly don't even try to prefetch swap-in beyond a tiny amount of read-ahead (which won't necessarily help anyway, given that the process image will have been paged out in an effectively random order).

I'm definitely interested in helping benchmark / trace the sorts of io patterns you'd see; I routinely see similar behaviour just from having several vim windows open to a project that I don't touch for a couple hours while I work on something else. :)

findleyr commented 4 years ago

@cunderwood-va that's a really interesting point, that I hadn't considered -- thank you!

When run as a daemon, we could expose the following options:

An idle connection timeout. The forwarder gopls should reconnect if it gets activity after its connection has been dropped.
An option to either shutdown or drop the cache after some time with no connected clients. The cache could be recreated when a client connects.

bhcleek commented 4 years ago

I prefer the option of a gopls server that shuts down automatically after some period after the last client disconnects. That's how gocode worked to good effect.

gopherbot commented 4 years ago

Change https://golang.org/cl/215742 mentions this issue: internal/lsp: refactor LSP server instantiation.

gopherbot commented 4 years ago

Change https://golang.org/cl/217598 mentions this issue: internal/lsp/fake: add fakes for testing editor interaction

gopherbot commented 4 years ago

Change https://golang.org/cl/218698 mentions this issue: internal/lsp/lsprpc: add an LSP forwarder and regtest environment

gopherbot commented 4 years ago

Change https://golang.org/cl/218778 mentions this issue: internal/lsp/lsprpc: add a forwarder handler

gopherbot commented 4 years ago

Change https://golang.org/cl/218839 mentions this issue: internal/jsonrpc2: support serving over unix domain sockets

gopherbot commented 4 years ago

Change https://golang.org/cl/220077 mentions this issue: internal/lsp/cache: return concrete types where possible

gopherbot commented 4 years ago

Change https://golang.org/cl/220078 mentions this issue: internal/lsp/debug: move all debug state onto the Instance

gopherbot commented 4 years ago

Change https://golang.org/cl/220081 mentions this issue: internal/lsp/lsprpc: add an handshake between forwarder and remote

gopherbot commented 4 years ago

Change https://golang.org/cl/220137 mentions this issue: internal/lsp/lsprpc: automatically resolve and start the remote gopls

gopherbot commented 4 years ago

Change https://golang.org/cl/220281 mentions this issue: internal/jsonrpc2: add an idle timeout for stream serving

gopherbot commented 4 years ago

Change https://golang.org/cl/220519 mentions this issue: internal/lsp/lsprpc: clean up client session on disconnection

gopherbot commented 4 years ago

Change https://golang.org/cl/220521 mentions this issue: internal/lsp/lsprpc: add a 1m timeout to auto-started gopls

findleyr commented 4 years ago

Thanks again for all the ideas above! Many of them have made it into the implementation.

Quick update: once the above stack of changes goes in (perhaps tomorrow), this should be ready to start testing. It's pretty feature-complete but I'm sure there are rough edges, particularly on Mac or Windows. I'll add an additional CL with documentation and follow-up here, but wanted to call out an implementation detail for plugin authors (@myitcv and @leitzler, I know this applies to govim, but can imagine that it applies to many other plugins and editors).

In the new implementation (on posix), you'll be able to do the following:

$ gopls -remote="auto"

Which will run an LSP forwarder and auto-start (if needed) a shared gopls hosting over a unix domain socket. That shared gopls instance will then automatically shut down after a minute with no connected clients. Seems ideal, but unfortunately vim's job control by default sends SIGTERM on the plugin process group when exiting (the "stoponexit" configuration value), which causes the auto-started shared gopls to terminate. https://github.com/vim/vim/blob/master/runtime/doc/channel.txt#L967

This is configurable, but then it is incumbent on plugin authors to observe whatever convention we decide upon.

I don't want to ignore SIGTERM, SIGINT, or SIGKILL on the shared gopls instance, but perhaps we could ignore SIGHUP. Then if e.g. govim were to set "stoponexit": "hup", exiting vim should terminate govim and the gopls forwarder, but leave the shared gopls instance intact. Any objection to this, or alternative ideas?

myitcv commented 4 years ago

Thanks @findleyr. As previously discussed, the -remote=auto mode sounds like a really good API.

That shared gopls instance will then automatically shut down after a minute with no connected clients

Is this timeout period configurable in some way, e.g. 0 meaning infinity? Because I would want the shared instance to stay around indefinitely (perhaps killable via a special flag to gopls that would kill a single or all remote instances).

Seems ideal, but unfortunately vim's job control by default sends SIGTERM on the plugin process group when exiting (the "stoponexit" configuration value), which causes the auto-started shared gopls to terminate.

Apologies, I've probably missed something in the CL chain because I haven't been following particularly closely, but does this imply we are not creating the remote gopls instance as a daemon? Because if it is a daemon, doesn't that mean it is detached from the process group of which Vim is the parent, and hence immune from this problem?

findleyr commented 4 years ago

Is this timeout period configurable in some way, e.g. 0 meaning infinity?

It is, and 0 means infinity. However, when auto-starting the daemon a timeout of 1m is used: https://golang.org/cl/220521 The assumption is that it would be bad behavior to leave around a gopls process consuming a large amount of memory -- see @cunderwood-va's comment above for a good argument against this.

Apologies, I've probably missed something in the CL chain because I haven't been following particularly closely, but does this imply we are not creating the remote gopls instance as a daemon?

Well, we are starting a daemon in some sense of the word. I what I didn't clearly explain was that I was relying on the daemon process being reparented after vim terminates. This is, admittedly, a little hacky.

But (for linux at least), we can probably avoid this hack using setsid/setpgid. I'll experiment with this.

myitcv commented 4 years ago

@cunderwood-va - I have zero swap on my machine, so presumably the effect in my case would be limited/zero?

gopherbot commented 4 years ago

Change https://golang.org/cl/221138 mentions this issue: gopls/doc: add documentation for running gopls as a daemon

gopherbot commented 4 years ago

Change https://golang.org/cl/221220 mentions this issue: internal/lsp/lsprpc: use setpgid on posix GOOSes, to avoid SIGTERMs

findleyr commented 4 years ago

I've pretty much gotten to the end of the features I wanted to implement, but I don't want to close this issue until I've done more testing in different environments and on different editors.

If anyone is interested in helping to test, there is now some documentation at https://github.com/golang/tools/blob/master/gopls/doc/daemon.md

I've been using it for a while with Vim (govim) on Linux.

If you run into problems, please feel free to post here or ping me on Slack, and I'll try to fix them ASAP.

zikaeroh commented 4 years ago

This is very cool. I use VS Code and plan to test this (looking good so far), but one immediate feedback I can give is that this mode makes it more difficult to restart gopls when something strange happens. I have to restart gopls somewhat often when I hit a bug (creating new files, renaming packages, desyncs, etc -- known issues that are already reported). To do this, I can either do a full VS Code reload (which is akin to restarting the editor in-place), or use the vscode-go extensions "restart Go language server" option (preferred)

When running normally, this is effective and does restart the process. But with -remote=auto, it doesn't actually restart gopls to the same level, which is expected given now all the real work is done elsewhere. I don't know of a solution to this (other than introducing some new signal the editor knows to send to restart something else), though I'll say that it's possible restarting this client will end up being effective anyway since most of the oddity is the non-shared stuff.

Where this gets weird is when I manually try to do this myself by killing the actual gopls server; the client instance of gopls crashes on the next request. Is there any way to detect the fact that the actual server disappeared and attempt to restart it? Or would this be a bad idea if many clients saw their server disappear and all try to create one?

findleyr commented 4 years ago

Thanks for testing!

When you restart the client gopls you'll get a new session on the daemon. My guess is that this will fix most of the problems you describe. If it doesn't that would be a bug (of course), and a very interesting one at that. The shared state should just be idempotent, cacheable computations, and should be unlikely to get corrupted.

But your point stands about the lack of an easy solution for restarting everything. Unfortunately having the sidecar (client) gopls seamlessly restart the daemon won't work, at least not without some significant engineering: when the daemon crashes its loses all of its session state: overlays, file versions, etc. We need the editor itself to recreate that state.

My hope is that simply restarting the sidecar gopls is sufficient. If it's not, I'll look into implementing a restart protocol. Any such protocol would inevitably end up crashing all connected clients, though, so ideally this would only be needed in rare cases.

zikaeroh commented 4 years ago

My guess is that this will fix most of the problems you describe.

And arguably, this is better than a full restart anyway (if it works), since a lot of the old data is there, versus the full restart throwing away the bad session data and the definitely-okay memoized stuff. So, I'm happier now than before!

when the daemon crashes its loses all of its session state: overlays, file versions, etc. We need the editor itself to recreate that state.

That is an excellent point I completely glossed over.

For reference, VS Code's LSP client will gracefully handle its client exiting unexpectedly, but only up to 5 times before it hits a limit and says "I'm not going to start this again, sorry", so there's still some leeway if it comes to killing the true server. Of course, the best case is that there aren't any bugs and I don't have to do this at all. :slightly_smiling_face:

findleyr commented 4 years ago

The shared state should just be idempotent, cacheable computations, and should be unlikely to get corrupted.

Sigh, I spoke too soon. As soon as I wrote that, I started auditing our cache keys to make sure this was the case, and found at least one place where caching might be broken if restarting the client and continuing to edit the same file. I don't even think it will be that uncommon if you are restarting the client frequently.

I think I never encountered this because my usage generally falls into the pattern of having several long running vim sessions open to different directories, not restarting sessions within the same directory.

I'll try to actually manifest this in my editor tomorrow, write a test, and fix it. In the meantime, beware that there is at least one known bug!

zikaeroh commented 4 years ago

Another case I thought of is doing gopls updates; you have to do a restart to load the new binary, so you do have to go through some shenanigans to make sure it's dead (not to mention the impact of potential version mismatches). Can't say I know the right thing there either.

leitzler commented 4 years ago

Super excited about this, great work!

An initial thought regarding the unix sockets and security. It looks like gopls uses whatever unix socket file that happens to match the expected name. Another user on the same system could create one with the expected name and gopls will happily communicate with it.

findleyr commented 4 years ago

@leitzler that's a great point that I had overlooked, thank you.

My initial thought is just to check that the socket owner is the same as the current user, but I can imagine that not everyone will want this restriction. We could make it configurable, and perhaps also expose configuration for the socket file permissions. Do you have any alternative suggestions?

findleyr commented 4 years ago

@zikaeroh when using -remote=auto on unix, the build id of the gopls binary is hashed into the socket path, so if you restart the sidecar with a new binary, the default socket path for that new binary version will be different, and a new daemon will be started.

The old daemon will still run as long as there are connected clients, but in the simple case where you have only one VS Code session, the old daemon will shut down after a minute.

I don't have a good solution on Windows yet, unfortunately.

leitzler commented 4 years ago

Yeah, checking owner & permission was my initial thought as well, and I agree that it probably won't be for everyone. On the other hand, I can't really figure out why you would like someone else owning the socket (or even let anyone else have access to it) so maybe add configuration as a new feature if someone requests it? Then we would have a use case as base for how the configuration should look like.

Also, we probably need to keep in mind how Windows handles owner & permissions of unix socket files. I don't have access to any Windows machines but I wouldn't be surprise if there are differences.

gopherbot commented 4 years ago

Change https://golang.org/cl/221697 mentions this issue: internal/lsp/lsprpc: use localhost for remote gopls debug interface

golang / go