Open dsherret opened 1 year ago
Thanks for opening this ticket, @dsherret! Just to clarify: the prototype introduced in #284 does execute requests concurrently wherever possible, provided their handlers take &self
. However, some methods that necessarily mutate state (especially the handlers for initialize, shutdown, and the on-opened/closed/saved/changed events) have been changed to to take &mut self
, ensuring they are always executed sequentially and in a coherent order. Is this not sufficient?
EDIT: Further detail may be found in this comment in particular: https://github.com/ebkalderon/tower-lsp/issues/284#issuecomment-1409582112
Maybe this is just an issue with how our implementation is designed and we're doing it wrong, but it wouldn't work for us because within requests like did_open
we sometimes call into the client and we never want to call into the client while being in an &mut self
state (in our case, holding a .write().await
with tokio's RwLock) because the client could call back into the server and cause a deadlock.
All this said, having it be Send
is more a nice to have.
Would the client ever call back into the server while the server is held exclusively in a &mut self
method, though? I haven't found any instances in the LSP specification where this is ever the case. Perhaps you could give an example?
Most, if not all, LSP state modification flows as I understand them tend to go along this direction:
textDocument/didOpen
notification to the server, which mutates state.textDocument/didOpen
and concurrent processing of requests may resume.Is this not the case?
FWIW, the necessary bar for marking a LanguageServer
trait method as &mut self
under the proposal described in #284 is very high. Specifically, it would have to be a request or notification which necessarily blocks user input and needs to be fully processed before server-side work can continue further. The did_*
trait methods seemed to fit this bill IMO, though I could be wrong about this. Servers are still permitted to send requests or notifications back to the client in these method handlers, but I don't think we can reasonably expect the server to be able to respond to any concurrent requests until its internal text document state has been fully synchronized with the client.
(Sorry, I said we never want to call back into the client in an &mut self
(rw_lock.write().async
), but actually we don’t want to call into the client even in a &self
(rw_lock.read().async
) in case the client calls into a &mut self
)
Basically, the spec can say whatever it likes to prevent this, but what happens in practice with clients is not always according to the spec. We previously had deadlocks occurring because of this scenario, so we just never do it anymore and no longer have issues related to that for what I know (that said, the code previously used an async Mutex for every request instead of an async RwLock, so that was probably the main reason).
I see what you mean, but none of the methods proposed in #284 to take &mut self
would ever make sense to be called again by the client in direct response to a server-sent request. Such cases cannot happen in practice unless requests are being sent to the server in a wholly nonsensical order from the client. I agree there are potential deadlock risks with other methods, but I don't think that's the case with these specifically.
Let's see how this might play out below...
Methods like initialize
, initialized
, and shutdown
are pretty obvious candidates for being marked &mut self
since they inherently require exclusive access to server state and are only called once during the application lifecycle. As such, it makes sense to process these sequentially and without any other &self
methods running concurrently.
Considering that initialize
is an inherently serial method and the server is allowed to reject all messages sent before or during its execution with error code -32002
(server not initialized) until the server has initialized, and shutdown
is allowed to reject all methods sent before or during its execution with error code -32600
(invalid request), I think all three methods would benefit from taking exclusive access to self
.
Whether the did_.*
methods should be marked &mut self
here is less clear cut, but I think the case in favor of this is still strong. These notifications are emitted by the client in response to a text document or workspace state change that occurred on the client side (there is a separate set of requests for synchronizing state changes from the server side to the client).
The client is the one notifying the server of the state change, not the other way around. This means it should already have all the information it needs on hand already and should be able to respond immediately to any request the server sends, thereby avoiding the trap described in https://github.com/ebkalderon/tower-lsp/issues/386#issuecomment-1470454747. This means it is impossible for the following example flow to ever happen:
did_open(&mut self)
on the server.did_open(&mut self)
is still processing and therefore can't respond to request "foo."Let's imagine that the client sends multiple did.*
notifications in a row, responding to any server-to-client requests in between. This should be fine, as the handlers for these notifications take &mut self
and will be processed serially on the server side.
As a bonus, this serial processing guarantees that any and all &self
methods ahead in the request queue will have resolved and responded to the client by now. Any spawned background tasks feeding data into these &self
methods which may have temporarily acquired exclusive access to shared state should have let go of the Arc<RwLock<T>>
or Rc<RefCell<T>>
at this point. If they haven't, then that's a logic error in the server implementation itself... This would lead to a deadlock regardless of whether you're using building your server on top of tower-lsp
, std::thread
, or something else.
Aside: For the most part, any language server which permits concurrent processing of
did.*
document state change notifications is very likely to encounter deadlocks in some form and/or get out of sync with the client state over time. Generally speaking, language servers want to handle these incomingtextDocument/did*
notifications on the main loop as early as possible before processing other incoming requests to ensure the state changes have been fully applied and in the correct order.
All the above points considered, I'm fairly confident that these methods should be fine to mark &mut self
and this change would not increase the risk of deadlocks for users. In fact, I suspect it might decrease it slightly (though is claim is entirely speculation on my part and not based on anything empirical; I'd like to see actual measurements confirming or denying this either way).
The final method marked &mut self
in the proposal linked in #284 is the workspace/executeCommand
request. This one I'm really not confident about, and I'm definitely open to leaving this one &self
if there are any drawbacks to doing so.
The rationale behind this change stems from the presumption that command execution is expected to be synchronous: the command being executed might update state on the server side, or it might not. If the state is updated, then processing this request serially and exclusively relative to other pending requests is important (just like the did_.*
methods described above), especially since the server will need to call the workspace/applyEdit
command back on the client to synchronize these state changes back before workspace/executeCommand
can respond.
Looking back at this helpful suggestion on Reddit by @matklad, I suspect there may be other LanguageServer
trait methods that would also benefit from taking mutable reference to self
. However, I'm not entirely sure what these methods should be and whether these changes would suit all tower-lsp
users uniformly, so I've avoided taking any stances on that for now.
(Please do correct me if I'm fundamentally misunderstanding some core assumptions about, not just the spec, but more the typical flows one sees in an average LSP interaction!)
Apologies for the massive wall of text! :heart: With that said, I'd be happy to continue discussion of the merits and drawbacks of supporting &mut self
methods over on #284, if you'd like.
In the meantime, I'd be happy to split out the work in #284 into two separate parts if that would help? That is, I could draft a release of tower-lsp
which simply drops the Send
bound on all futures while leaving all LanguageServer
trait methods &self
for now (addressing the direct concern of this ticket) and then we could revisit the issue of &mut self
receivers separately.
@ebkalderon sorry for my delay responding here and thanks for the overview! I think you're right that having &mut self
(especially the initialization methods) is probably ok. That said, I think it is nice for consumers to have control over this if they'd like to and not have the Send
requirement.
In the meantime, I'd be happy to split out the work in https://github.com/ebkalderon/tower-lsp/issues/284 into two separate parts if that would help? That is, I could draft a release of tower-lsp which simply drops the Send bound on all futures while leaving all LanguageServer trait methods &self for now (addressing the direct concern of this ticket) and then we could revisit the issue of &mut self receivers separately.
That would be extremely helpful! It would be great for us to be able to drop the Send
requirement because we will then be able to get rid of a lot of needless mutexes.
In #284 there is some discussion about having
#[tower_lsp::async_trait(?Send)]
which I presume is a way to not have theSend
requirement on futures. That said, the change seems to execute the requests sequentially and have mutable methods. How feasible would it be to have the option to not have theSend
requirement and still execute requests concurrently for single threaded runtimes? Then the client can decide which requests should run sequentially or on a separate thread and also the code consuming this can useRc
andRefCells
instead of needing to useArc
andMutex
.