Open ebkalderon opened 3 years ago
I have found some interesting information regarding the Language Server Protocol's consistency requirements that could help inform our future design moving forward.
First, the specification describes the correct server behavior of textDocument/didChange
notifications as follows:
\<snip> To mirror the content of a document using change events use the following approach:
- start with the same initial content
- apply the 'textDocument/didChange' notifications in the order you receive them.
- apply the
TextDocumentContentChangeEvent
s in a single notification in the order you receive them.
This is obvious and makes intuitive sense. We must process text document change notifications strictly in the order in which they are received to ensure state consistency on the server side.
We do indeed process these events in order, thanks to Stream::buffered
, but there is a risk that multiple notifications may be processed concurrently (because Buffered<St>
is implemented in terms of FuturesOrdered<T>
which races futures to completion in parallel but yields results in the original order), and as a generic LSP framework, we cannot guarantee that all downstream LanguageServer
trait implementers will behave sanely when multiple changes are processed concurrently. As such, we should consider processing all incoming JSON-RPC notifications sequentially (but still asynchronously), or at least make the concurrency level configurable.
For reference, rust-analyzer
appears to always process notifications sequentially (see handle_event()
& on_notification()
in main_loop.rs
) while request processing is largely performed in parallel, with the exception of only two standard LSP requests: shutdown
and textDocument/selectionRange
. This, along with the Notification Message section of the LSP specification, lends credence to the idea that all notification messages should be handled sequentially and in the order in which they are received, while requests may be freely handled concurrently.
Hey! Thanks for working on this project. It's super helpful.
I wanted to add some example use cases to this. One thing we want to implement is a custom request that can be used to interrogate the LSP client. For example, suppose we want to ask the client to fetch all workspace symbols across all language. Then we end up with the following flow:
. Source.
Relevant to this discussion, you can see that within handling the initial request (id=0), a few things happen:
So basically, there's a point in time where there are three pending futures on the server:
id=0: the original request from the client
id=1: the server's custom request to the client
id=2: the new request from the client
And these need to be handled in reverse order.
I have this implemented, and it was blocking until I made this change. Which is probably what you would expect to happen:
diff --git a/src/transport.rs b/src/transport.rs
index 0b4a647..69c4802 100644
--- a/src/transport.rs
+++ b/src/transport.rs
@@ -95,11 +95,12 @@ where
let mut framed_stdin = FramedRead::new(self.stdin, LanguageServerCodec::default());
let framed_stdout = FramedWrite::new(self.stdout, LanguageServerCodec::default());
- let responses = receiver.buffered(4).filter_map(future::ready);
+ let responses = receiver.buffer_unordered(4).filter_map(future::ready);
let interleave = self.interleave.fuse();
Does this make sense? Or am I using custom request wrong? Because without the above change, I'm struggling to see how sending a server->client request could ever make progress?
I agree with @samscott89 that having concurrency should probably be possible. However it would probably be nice for some language servers to have no concurrency at all, because that might make the whole thing a bit simpler.
Most language servers will not be able to do a lot concurrently anyway. Calculating diagnostics (which is typically the most expensive operation) can usually not be done at the same time as calculating completions, because they access the same data structures. Although, most of the time caches can be used to avoid recalculating everything.
So IMO unless you have a more fancy architecture like @samscott89 you probably want no concurrency at all and this should probably be the default. However I don't see why users should not also be able to work with concurrency.
Thanks for the suggestions and helpful discussion! However, there is unfortunately trouble associated with the use of buffer_unordered()
, as described in commit https://github.com/ebkalderon/tower-lsp/commit/456856b2d114f624d6bb6488ffbcc54e3efef60d which makes it undesirable. The issue is that while client-to-server and server-to-client messages should indeed be processed concurrently and in any order (as the transport is a full duplex channel), the individual client-to-server requests and server-to-client requests must return their responses in the order in which they were received, regardless of whether concurrency is used or not, as per specification. The current design interleaves server-to-client and and client-to-server communication over a single async channel, which means that neither buffered()
nor buffer_unordered()
will truly yield the correct behavior in this instance.
One possible solution that could solve the problem while preserving the specification invariants might be, as described in the issue description, to have two separate channels for server-to-client and client-to-server communication. Each of these channels would use buffered()
by a certain amount to preserve the output ordering of client requests and server requests, respectively. However, these two channels would then be selected upon and process their respective messages without ordering to ensure that server-to-client and client-to-server communication can be buffered independently without either one blocking the other.
This still doesn't solve the data integrity issues caused by allowing servers to have concurrent handlers (which is the original subject of this issue), but it may at least fix the blocking problems @samscott89 described caused by client-to-server and server-to-client requests accidentally blocking on each other, I think.
I do like @davidhalter's suggestion of making the buffered concurrency configurable by the user and possibly disabled by default, though. Perhaps that may be the best route forward: come out of the box with concurrent processing of client-to-server and server-to-client messages, but each direction processing handlers sequentially in the order in which they were received.
the individual client-to-server requests and server-to-client requests must return their responses in the order in which they were received, regardless of whether concurrency is used or not, as per specification.
This is not how I understand the spec. The spec says:
However, the server may decide to use a parallel execution strategy and may wish to return responses in a different order than the requests were received. The server may do so as long as this reordering doesn’t affect the correctness of the responses.
I understand that this means that reording/rename requests may not be reordered, but I would say that even there there are exceptions: For example a local definition lookup could in theory be reordered with a rename in a different part of the code. That is still correct behavior according to the specification.
What do you think?
I agree with @davidhalter's reading of things. But even so
The server may do so as long as this reordering doesn’t affect the correctness of the responses.
sounds like a brutal invariant to attempt to support generically! So even if technically it should be allowed, I could see a good argument for not supporting it in a higher-level framework like tower-lsp
. For my specific use case, it would be more than reasonable to not support handling client requests out of order. But it is still a problem that a server cannot request information from the client during handling a client request. In the above diagram, the original id=0
request will block while waiting on id=1
, but id=1
is behind id=0
in the buffer.
As far as I can tell, your proposed solution would achieve that:
However, these two channels would then be selected upon and process their respective messages without ordering to ensure that server-to-client and client-to-server communication can be buffered independently without either one blocking the other.
with that in place, you would see:
id=2
request from the client cannot complete while the server is still processing id=0
. <-- IMO this is a reasonable restriction. Otherwise you need to support infinitely nested client-server interactions.id=1
request from the server to the client would complete before id=0
is finished, which would allow id=0
to continue and compute the response.Sounds like it should be non-concurrent by default with opt-in to concurrency? I can't imagine there are many extensions that would really benefit from concurrency anyway.
That's certainly a possibility, @Timmmm. Though it would be a shame to disallow scenarios such as @samscott89 described in this server. I do feel like this use case should be better supported by tower-lsp
without too much fiddling, somehow.
I think I've devised a vague idea for how to proceed, though. If you check the work-in-progress support-custom-server-requests branch opened as part of #256, you may notice that the internal JSON-RPC implementation was completely reworked to support defining custom requests on LSP servers.
I can see a possible future where the interface for registering JSON-RPC methods also requires the user to specify a concurrency mode which essentially tells the Server
(or LspService
; haven't decided yet) whether to execute the incoming client-to-server request sequentially or concurrently. With this design:
buffered()
or buffered_unordered()
for this? Tradeoff between delayed/ordered vs. rapid/unordered returning of responses.future::stream::select()
as they become available (along with any server-to-client request messages as well). Whether tower-lsp
should ship by default with all standard LSP methods configured to "sequential" or some sensible mixture of "sequential" and "concurrent" based on a conservative reading of the original spec is up for debate. But no matter the approach taken, these settings should probably be customizable by the end user when constructing the Server
or LspService
(depending on, again, where this request buffering functionality should ideally live).
In the example that @samscott89 gave above, one would explicitly define the custom id=0
request or notification on "Server" to be of the "concurrent" type, which would allow that async task to stick around and wait while the server processes the workspace/symbols
id=3
request alongside it and the response to id=1
is received. This is precisely the behavior that we want.
How does this approach sound to all of you? I'd appreciate it if you could try to poke holes in this design.
Thanks for the thoughtful response @ebkalderon!
In our case, we ended up scoping back the functionality to avoid needing to do that additional back-and-forth work. I don't know enough about what other LSPs are doing, but I wouldn't be surprised if what I was trying to do has a bit of a "code smell" to it.
If it's not super common, the risk is that supporting these esoteric use cases is going to add a bunch of complexity, so it might not be worth it?
Putting that aside though, what you described sounds like a cool way to make it work :)
If I understand it right you're saying commands are executed exactly in order, except if they're marked "concurrent" and then they can be executed concurrently with any commands in the same contiguous group of concurrent commands?
That sounds like a decent plan to me.
Precisely! Indeed, I'm not entirely sure whether the added complexity will be worth it, but I think it's worth prototyping in a branch and mulling over.
For now, in the process of refactoring the JSON-RPC router in the support-custom-server-requests branch of #256, I'm moving forward with the following set of changes from this ticket:
Any proposed solution to this ticket (which I feel might even affect portions of #177) will have to come later in a separate release.
Seems like the approach described a few comments up might be a good way to address the &mut
methods suggestion mentioned in this Reddit comment by u/matklad.
I've been investigating different approaches to allowing &mut self
methods in an ergonomic way, as suggested in the Reddit link above. Here is a Router<S>
prototype which accepts method handlers with either &self
or &mut self
receivers (Playground).
However, due to the unfortunate inability to express the following:
impl<R, O, F, Fut> Method<&R, (), O> for F
where
R: 'static,
F: for<'a> Fn(&'a R) -> Fut, // How do I express `Fut` has HRTB 'a here?
Fut: Future<Output = O>,
the Playground example cannot accept async fn
methods at present (see comment in Playground code).
If we abandon this approach and don't attempt to abstract over mutability, we can work around this limitation and reduce a few unnecessary heap allocations too. But this negatively impacts the API ergonomics, where users of LspServiceBuilder
are forced to register &self
method handlers with .custom_method()
and &mut self
handlers with .custom_method_mut()
, respectively.
I have managed to go beyond the approach described in the Playground link and successfully abstracted the blanket Method
trait impls over mutability, but it's pretty complex and I'm looking to simplify the traits further, if possible.
I've found a satisfactory way to work around the above issues with higher-kinded lifetimes (Playground). The gist of it is as follows:
Closure<Recv, Args, Out>
with an invoke()
method.F: Fn(Recv, Args) -> Fut
where Fut: Future<Output = Out>
. This implementation simply calls itself and forwards any arguments: self(recv, args)
.Method<Recv, Args, Out>
with a similarly named call()
method.Method<&'a Recv, Args, Out>
for all types satisfying for<'b> Closure<&'b Recv, Args, Out> + 'a
.Method<&'a mut Recv, Args, Out>
for all types satisfying for<'b> Closure<&'b mut Recv, Args, Out> + 'a
.async
methods taking either &self
or &mut self
receivers as arguments to functions expecting F: Method<Recv, Args, Out>
.This hack relies on this intermediate Closure
trait to express the higher-kinded lifetime bound previously considered inexpressible.
This took me an incredibly long time to figure out, bumping up against limitations such as https://github.com/rust-lang/rust/issues/70263 frequently along the way. I'm pretty happy with the ergonomics of the end result!
FYI @ratmice, I hope to relax the
Send
bound requirements onLanguageServer
futures eventually as part of #284 (comment), thereby supporting thread-local futures out of the box. TheClient
type will remainSend
, of course, in case anyone wants to pass it into aSend
task.
Responding here, because it probably makes more sense.
I've been doing a bit of refactoring in the code for my project (from which https://github.com/ebkalderon/tower-lsp/pull/340 was a simplified example). Having done so, I kind of doubt now that i'm actually going to be able to meet these (or any!) trait bounds, the primary issue underlying it being self references in the thread which I currently have to spawn to respond to events.
I'll definitely try and check out this HKLT, when there is a branch to look at, but the the feeling I get is that even with these trait bounds relaxed, it is likely I don't have a struct to implement satisfy them.
With regards to the simplified example in #340, dropping the Send
bound on all futures indeed allows !Send
struct fields, and these values may also be borrowed across .await
points in LanguageServer
method handlers (Playground link)[^1]. This enables servers to call out to Client
at any time while holding references to !Send
data.
Additionally, the current push to mark certain methods that inherently depend on exclusive access to server state &mut self
(e.g. initialize
, shutdown
, and the did_.*
notifications) should make the API easier to reason about as well.[^2]
It's still perfectly fine to spawn background tasks to tokio::spawn()
or std::thread::spawn()
inside any of these handlers, of course. Though if you do, you will need to make sure the data passed in and out of the task it is Send + 'static
like always.
[^1]: In reality, tower_service::Service
requires that all futures be at least 'static
, unfortunately (see this comment on tower-rs/tower). The language server instance could be wrapped in an Rc
/Arc
internally to satisfy this requirement, though, which is exactly what tower-lsp
does today.
[^2]: Note that the use of tokio::sync::RwLock
in the playground example to support &mut self
methods is just for demonstration purposes. The real implementation I'm working on uses a fast async-aware semaphore instead of locks.
Okay, I've pushed up a quick and dirty implementation to the support-mutable-methods
branch. Haven't stress-tested it much yet, but all tests pass. Would appreciate any feedback from folks!
An open question regarding the implementation in the above branch: should the future returned by Server::serve()
be Send
? The current work returns a thread-local !Send + !Sync
future, since we assume most users will not attempt to spawn()
their server, but will instead be .await
-ing it on the main thread. This lets us use a lightweight async semaphore + RefCell
to synchronize access to the inner LanguageServer
instead of an async RwLock
(which may or may not be tied to a specific async runtime, something we don't want).
If anyone here depends on Server::serve()
being Send
, please speak up now!
I am intending to try this out ASAP however, i've been a bit under the weather hopefully this weekend over the next few days I'll feel well enough to sit down and work on it. I'm definitely just .await
-ing in the main thread though.
Sure thing. Sorry to hear you're under the weather, @ratmice. Hope you feel better soon!
Thanks for chiming in about your own usage of the Server::serve()
future. Would love to hear other folks' feedback too. I may let this thread simmer for a bit longer and wait for other responses to trickle in.
@ebkalderon It seems like it is going to take me a while to do full port of my lsp server so it can make full usage of the new branch, internal refactoring required to do is is quite large. Instead I had tried to do a little proof of concept port from scratch but starting out with just the example in the README.md
on the branch, but ran into this error with Send
bounds (for initialize
, initialized
, and shutdown
. Curious if others can reproduce or if the basic ReadMe example from the branch works for others?
error[E0053]: method `initialize` has an incompatible type for trait
--> server/src/main.rs:12:5
|
12 | async fn initialize(&mut self, _: InitializeParams) -> Result<InitializeResult> {
| ^^^^^ expected trait `Future<Output = Result<tower_lsp::lsp_types::InitializeResult, tower_lsp::jsonrpc::Error>>`, found trait `Future<Output = Result<tower_lsp::lsp_types::InitializeResult, tower_lsp::jsonrpc::Error>> + Send`
|
= note: expected fn pointer `fn(&'life0 mut Backend, tower_lsp::lsp_types::InitializeParams) -> Pin<Box<...>>`
found fn pointer `fn(&'life0 mut Backend, tower_lsp::lsp_types::InitializeParams) -> Pin<Box<...>>`
Interesting, I haven't run into this issue while working on the branch. CI hasn't indicated any errors when checking the in-repo examples
either. Do you think you could push up your WIP work to a new GitHub repo so I can take a look with you?
Curious I should look more closely at the in repo examples to see if I can spot a difference. I went and pushed the example I encounter the error with here, https://github.com/ratmice/nimbleparse_lsp_mutable_methods with the example code in server/src/main.rs.
Edit: Looks like the readme might just be missing the #[tower_lsp::async_trait(?Send)]
bounds
Yes, that's absolutely correct. All examples and doc tests have switched to #[async_trait(?Send)]
, but it seems I forgot to update the README accordingly. :+1: With that in place, the example in the linked repo executes fine for me indeed.
One small change that I'm less than happy with, introduced as a result of this work, is the new method signature of LspService::inner()
. Where previously it had returned a &S
where S: LanguageServer
, it now returns an std::cell::Ref<'_, S>
guard instead. This minor wart is unfortunately unavoidable because of the internal immutability added to LspService
in order to support &mut self
methods.
Perhaps research into #355 will yield a nicer API for testing servers (which is what this method is primarily intended for), which would let us remove the .inner()
method altogether. With that said, I will happily live with this for now.
Hey @ebkalderon, just wanted to see what the status is with making this change to tower-lsp
. I just modified our server to target the support-mutable-methods
branch to see how the new changes feel.
It's so nice to have a clear boundary where the internal server state can be mutated safely and where it should be taken by reference. Not needing to wrap everything in a RwLock
or similar really is a massive ergonomic win. Would love to see this change land in an official release. How are you feeling about it? Is there something holding you back? Is there anything you need help with? Keen to lend a hand if it helps move this along.
Hi there, @JoshuaBatty! Thanks for your interest in this issue. I think the support-mutable-methods
branch is largely ready to go in its current state. However, there are a few complications currently blocking its immediate release.
tower-lsp
is being used by several large projects at this time, I would prefer not to break API stability too egregiously (though I'm not afraid to do so, if I must). I would love to collect some feedback from at least the top-largest key players before merging this branch in, but I'm not sure how to best go about this yet.&self
and &mut self
method receivers in the current trait-based LanguageServer
API is unfortunately not practical, so this would require a deeper API redesign which I suspect would, again, disrupt downstream users quite a bit.LanguageServer
trait API regardless in order to resolve #386 (making &self
and &mut self
concurrency user-configurable) and #393 (allow overriding default method signatures). I have promising ideas on this front, but pursuing this would make the support-mutable-methods
branch obsolete if I went in this direction.Hope this sheds some light on the current state of things! Ideally, I would love to be able to release the support-mutable-methods
branch ASAP, to improve ergonomics for users in the short-term, and then release a more flexible redesigned API later. I worry about how downstream projects would react to this increased level of churn, though.
I'd love to hear thoughts from anyone (the biggest tower-lsp
users out there especially) on what would be the best path forward!
- I would love to collect some feedback
While not a large user (I've only communicated with one person besides myself that I know has attempted to use my lsp) , my feedback from my experiment was that it did work and appeared it would allow me to avoid making separate threads and serializing lsp messages via channels to those threads.
Unfortunately in order to leverage it required the use of self-referencing structures (which I am somewhat hesitant to do). If it wasn't for that, it should be a no-brainer for my project. But at the time it seemed like trading the mess I have (which works ok for now) for some ownership messiness of a new kind. So, I wasn't quite sold yet on the full rewrite that would allow it to leverage the new things which would only be possible with the mutable-methods branch. I haven't done an update to my current code base to the mutable-methods branch (not really leveraging the new possibilities), but would be happy to do that if it would be helpful.
Good to know that the branch works as expected, @ratmice! Thanks for the input. I presume the need for self-referential structs is a specific requirement of the specific language server being built and not an issue with the branch, as I have never encountered such a need in other downstream projects relying on tower-lsp
AFAIK, but please correct me if I'm wrong there.
Yeah, it is a specific detail of the language server being built, the current implementation only avoids self-referential types because because it is borrowing local variables within that threads main loop, but once it has to be within something which implements a trait it needs to become a self-referential type.
Edit Where on the current master I just couldn't implement the trait due to bounds requirements (IIRC).
Thanks for the detailed response and shedding light on the complexities involved. I genuinely appreciate the effort you're putting into maintaining the stability and usability of tower-lsp
.
From our standpoint, the benefits of the support-mutable-methods
branch are immediately tangible. The ergonomics it offers are massive for us, and having that in an official release would significantly enhance our development experience.
I understand the concerns regarding the potential API redesign and how it might introduce additional churn. While long-term stability is essential, I'd argue that we should prioritize immediate benefits that have a clear positive impact. We're willing to adapt and navigate through any churn that comes with future redesigns.
@JoshuaBatty Thanks for the helpful reply, especially regarding whether the branch works as expected and how getting it merged would likely impact your project. I'm glad to hear folks seem generally in favor of it getting merged so far.
I think I would like to hear from folks working on the Deno project, particularly from @dsherret and @bartlomieju (sorry to ping you both directly) over on https://github.com/denoland/deno/issues/18079, which had spawned further discussion in #386. Would merging in the support-mutable-methods
branch break anything over at Deno today?
Not sure if it helps, but I'm trying an alternative design leveraging the interface of tower::Service
for all methods in my project async-lsp. That are request handlers fn(&mut State, Params) -> impl Future + 'static
and notification handlers fn(&mut State)
. This approach has the downside of incompatibility with async fn
sugar, forcing mutations being done before computations, but simplifies the concurrency implementation. We can just call the handler, spawn the returned future and repeat, then everything is correctly ordered and asynchronous computations are fully concurrent.
The current approach of support-mutable-methods
uses async fn(&mut State, Params)
aka. fn<'a>(&'a mut State, Params) -> impl Future + 'a
, which holds the write lock to State
during the task duration. It means the frequent did_change
handler becomes the join point and concurrency can only happen between two keystrokes. This may already be the case for some language servers, but when State
is cheap to snapshot/clone, concurrent read and write are possible and can be more efficient.
But well, yes, it's more ergonomic than both the current APIs and my approach.
I think I would like to hear from folks working on the Deno project, particularly from @dsherret and @bartlomieju (sorry to ping you both directly) over on https://github.com/denoland/deno/issues/18079, which had spawned further discussion in https://github.com/ebkalderon/tower-lsp/issues/386. Would merging in the support-mutable-methods branch break anything over at Deno today?
I don't think it would break anything because we wouldn't use (?Send)
on the trait so the interface should remain the same, correct? I think we'll just continue to use the current interface and do some hacks to make it work better with a single threaded runtime. I'd rather we control the concurrency and mutability for the reasons outlined in the other issue.
Sounds like we might be able to move forward with this then? How are you feeling about it @ebkalderon?
Hey @ebkalderon sorry to be pushy, any update on your thoughts for moving this ahead?
I'm building a language server and recently prototyped a tower-lsp rewrite. So far I really like the API.
However, this issue has come up in a concrete way: It's hard to understand the order in which did_close
, did_open
and did_change_watched_file
are executed after a file rename in vscode. In the lsp-server implementation did_close
was handled first, whereas in the tower-lsp implementation prototype did_open
is handled first (seemingly).
I need to ensure that any references to the old filename get updated before running diagnostics... when I open the "new" file, I need some way of knowing that it's actually just the old file with a new name. Since it's hard to predict which handler will be executed first it seems that the workaround will be to clean up deleted file references for both did_open
and did_change_watched_file
.
Now in this case it's probably not too big of a deal but the solution feels hacky and in the case of more complicated combinations of long running tasks I'll have to get creative to avoid redundant execution of expensive stuff.
(I still need to try will_rename
, hopefully that will fix everything in this specific case since the client literally waits for a response to that before executing the rename...)
In general, it'd be nice to have insight into and control over the handler execution order, especially in cases where clusters of interrelated requests come in quickly, for example to explicitly debounce sequences of interdependent tasks that could be triggered by multiple events concurrently, or to throttle expensive tasks. Maybe I can use streams and stream extensions for this? The LanguageServer
trait implementation handlers can directly handle certain things if there isn't a potential for interference, but in other cases maybe they could defer to some more complex logic by sending events to channels that get processed as streams. In that case, what would be the advantage of using tower-lsp over lsp-server? The easy to use API is definitely one advantage. Maybe there are others?
When it comes to rust async stuff I'm definitely "Alan", so it's hard for me to grasp whether the support-mutable-methods
changes can resolve my issues.
Hello,
I would like to make the point on this issue. We are using TowerLSP for our project, and we are wondering about the message ordering.
As I understand, the project is actually synchrone
as far as you don't release the task with an await
or similar call?
This is a real issue for us, as some methods have to call customs routes and wait for the response of the client, making the server unstable. As the specs indicates, Completion
or signatureHelp
can be reordered, but reordering did_change
notification is totally forbidden, especially if we use incremental file cache and not the TextDocumentSyncKind::FULL
option.
Actually my plan is to list all method that can be reordered and ones that can't, and check before any execution that no other 'not-reorderable' function is executing.
Do you think it could temporary solve the issue with the actual version of TowerLSP?
EDIT: It won't, because if some tasks are blocked, I can't guarantee which one will wake up first
Background
In order for the framework to support both standard streams and TCP transports generically, we need a way to interleave both client-to-server and server-to-client communication over a single stream. This topic has been the subject of a previous issue. The current strategy used by
tower-lsp
(andlspower
in turn) to accomplish this is somewhat simplistic:The third step above is particularly significant, however, and has some unintended consequences.
Problem Summary
A closer reading of the "Request, Notification and Response Ordering" section of the Language Server Protocol specification reveals:
This is concerning to me because
tower-lsp
unconditionally executes pending async tasks concurrently without any regard for the correctness of the execution. The correct ordering of the outgoing messages is preserved, as per the spec, but the execution order of the handlers is not guaranteed to be correct. For example, an innocent call toself.client.log_message().await
inside one server handler might potentially prompt the executor to not immediately return back to that handler's yield point, but instead start processing the next incoming request concurrently as it becomes available.As evidenced by downstream GitHub issues like https://github.com/denoland/deno/issues/10437, such behavior can potentially cause the state of the server and client to slowly drift apart as many small and subtle errors accumulate and compound over time. This problem is exacerbated by LSP's frequent use of JSON-RPC notifications rather than requests, which don't require waiting for a response from the server (see relevant discussion in microsoft/language-server-protocol#584 and microsoft/language-server-protocol#666).
It's not really possible to confidently determine whether any particular "state drift" bug was caused by the server implementation of
tower-lsp
without stepping through with a debugger. However, there are some things we can do to improve the situation for our users and make such bugs less likely.Possible solutions
For example, we could process client-to-server and server-to-client messages concurrently to each other, but individual messages of each type must execute in the order they are received, one by one, and no concurrency between individual
LanguageServer
handlers would be allowed.This would greatly decrease request throughput, however. Perhaps it would be beneficial to potentially allow some handlers to run concurrently where it's safe (with user opt-in or opt-out), but otherwise default to fully sequential execution. To quote the "Request, Notification and Response Ordering" section of the Language Server Protocol specification again:
If choose to go this route, we may have to determine ahead of time which handlers are usually safe to reorder and interleave and which are not. I imagine this would introduce a ton of additional complexity to
tower-lsp
, though, so I'd rather leave such responsibilities to the server author to implement themselves, if possible.Below are a few key takeaways that we can start adding now to improve the situation:
The matters of optional user-controlled concurrency are still unanswered, however, and will likely require further design work.
Any thoughts on the matter, @silvanshade?