Concurrent requests without `Send` requirement for single threaded runtimes

dsherret commented 1 year ago

In #284 there is some discussion about having #[tower_lsp::async_trait(?Send)] which I presume is a way to not have the Send requirement on futures. That said, the change seems to execute the requests sequentially and have mutable methods. How feasible would it be to have the option to not have the Send requirement and still execute requests concurrently for single threaded runtimes? Then the client can decide which requests should run sequentially or on a separate thread and also the code consuming this can use Rc and RefCells instead of needing to use Arc and Mutex.

ebkalderon commented 1 year ago

Thanks for opening this ticket, @dsherret! Just to clarify: the prototype introduced in #284 does execute requests concurrently wherever possible, provided their handlers take &self. However, some methods that necessarily mutate state (especially the handlers for initialize, shutdown, and the on-opened/closed/saved/changed events) have been changed to to take &mut self, ensuring they are always executed sequentially and in a coherent order. Is this not sufficient?

EDIT: Further detail may be found in this comment in particular: https://github.com/ebkalderon/tower-lsp/issues/284#issuecomment-1409582112

dsherret commented 1 year ago

Maybe this is just an issue with how our implementation is designed and we're doing it wrong, but it wouldn't work for us because within requests like did_open we sometimes call into the client and we never want to call into the client while being in an &mut self state (in our case, holding a .write().await with tokio's RwLock) because the client could call back into the server and cause a deadlock.

https://github.com/denoland/deno/blob/3a46a89e34aab3bea9f555ef6c80af33a7bc2194/cli/lsp/language_server.rs#L2897-L2900

All this said, having it be Send is more a nice to have.

ebkalderon commented 1 year ago

Would the client ever call back into the server while the server is held exclusively in a &mut self method, though? I haven't found any instances in the LSP specification where this is ever the case. Perhaps you could give an example?

Most, if not all, LSP state modification flows as I understand them tend to go along this direction:

Client sends a textDocument/didOpen notification to the server, which mutates state.
Server sends back request A to the client in between.
Client responds to request A.
- Would the client ever send another request B in between that must be responded to by the server before response A can be sent? Is this an expected flow according to the specification?
Server finishes processing textDocument/didOpen and concurrent processing of requests may resume.

Is this not the case?

ebkalderon commented 1 year ago

FWIW, the necessary bar for marking a LanguageServer trait method as &mut self under the proposal described in #284 is very high. Specifically, it would have to be a request or notification which necessarily blocks user input and needs to be fully processed before server-side work can continue further. The did_* trait methods seemed to fit this bill IMO, though I could be wrong about this. Servers are still permitted to send requests or notifications back to the client in these method handlers, but I don't think we can reasonably expect the server to be able to respond to any concurrent requests until its internal text document state has been fully synchronized with the client.

dsherret commented 1 year ago

(Sorry, I said we never want to call back into the client in an &mut self (rw_lock.write().async), but actually we don’t want to call into the client even in a &self (rw_lock.read().async) in case the client calls into a &mut self)

Basically, the spec can say whatever it likes to prevent this, but what happens in practice with clients is not always according to the spec. We previously had deadlocks occurring because of this scenario, so we just never do it anymore and no longer have issues related to that for what I know (that said, the code previously used an async Mutex for every request instead of an async RwLock, so that was probably the main reason).

ebkalderon commented 1 year ago

I see what you mean, but none of the methods proposed in #284 to take &mut self would ever make sense to be called again by the client in direct response to a server-sent request. Such cases cannot happen in practice unless requests are being sent to the server in a wholly nonsensical order from the client. I agree there are potential deadlock risks with other methods, but I don't think that's the case with these specifically.

Let's see how this might play out below...

Server lifecycle messages

Methods like initialize, initialized, and shutdown are pretty obvious candidates for being marked &mut self since they inherently require exclusive access to server state and are only called once during the application lifecycle. As such, it makes sense to process these sequentially and without any other &self methods running concurrently.

Considering that initialize is an inherently serial method and the server is allowed to reject all messages sent before or during its execution with error code -32002 (server not initialized) until the server has initialized, and shutdown is allowed to reject all methods sent before or during its execution with error code -32600 (invalid request), I think all three methods would benefit from taking exclusive access to self.

Text document & workspace state change notifications

Whether the did_.* methods should be marked &mut self here is less clear cut, but I think the case in favor of this is still strong. These notifications are emitted by the client in response to a text document or workspace state change that occurred on the client side (there is a separate set of requests for synchronizing state changes from the server side to the client).

The client is the one notifying the server of the state change, not the other way around. This means it should already have all the information it needs on hand already and should be able to respond immediately to any request the server sends, thereby avoiding the trap described in https://github.com/ebkalderon/tower-lsp/issues/386#issuecomment-1470454747. This means it is impossible for the following example flow to ever happen:

Client calls did_open(&mut self) on the server.
Server sends request "foo" to the client and expects a response before processing further incoming requests.
Client sends another request to the server to gather some information in order to respond to "foo".
Server deadlocks, because did_open(&mut self) is still processing and therefore can't respond to request "foo."

Let's imagine that the client sends multiple did.* notifications in a row, responding to any server-to-client requests in between. This should be fine, as the handlers for these notifications take &mut self and will be processed serially on the server side.

As a bonus, this serial processing guarantees that any and all &self methods ahead in the request queue will have resolved and responded to the client by now. Any spawned background tasks feeding data into these &self methods which may have temporarily acquired exclusive access to shared state should have let go of the Arc<RwLock<T>> or Rc<RefCell<T>> at this point. If they haven't, then that's a logic error in the server implementation itself... This would lead to a deadlock regardless of whether you're using building your server on top of tower-lsp, std::thread, or something else.

Aside: For the most part, any language server which permits concurrent processing of did.* document state change notifications is very likely to encounter deadlocks in some form and/or get out of sync with the client state over time. Generally speaking, language servers want to handle these incoming textDocument/did* notifications on the main loop as early as possible before processing other incoming requests to ensure the state changes have been fully applied and in the correct order.

All the above points considered, I'm fairly confident that these methods should be fine to mark &mut self and this change would not increase the risk of deadlocks for users. In fact, I suspect it might decrease it slightly (though is claim is entirely speculation on my part and not based on anything empirical; I'd like to see actual measurements confirming or denying this either way).

Execute command request

The final method marked &mut self in the proposal linked in #284 is the workspace/executeCommand request. This one I'm really not confident about, and I'm definitely open to leaving this one &self if there are any drawbacks to doing so.

The rationale behind this change stems from the presumption that command execution is expected to be synchronous: the command being executed might update state on the server side, or it might not. If the state is updated, then processing this request serially and exclusively relative to other pending requests is important (just like the did_.* methods described above), especially since the server will need to call the workspace/applyEdit command back on the client to synchronize these state changes back before workspace/executeCommand can respond.

Other candidates?

Looking back at this helpful suggestion on Reddit by @matklad, I suspect there may be other LanguageServer trait methods that would also benefit from taking mutable reference to self. However, I'm not entirely sure what these methods should be and whether these changes would suit all tower-lsp users uniformly, so I've avoided taking any stances on that for now.

(Please do correct me if I'm fundamentally misunderstanding some core assumptions about, not just the spec, but more the typical flows one sees in an average LSP interaction!)

ebkalderon commented 1 year ago

Apologies for the massive wall of text! :heart: With that said, I'd be happy to continue discussion of the merits and drawbacks of supporting &mut self methods over on #284, if you'd like.

In the meantime, I'd be happy to split out the work in #284 into two separate parts if that would help? That is, I could draft a release of tower-lsp which simply drops the Send bound on all futures while leaving all LanguageServer trait methods &self for now (addressing the direct concern of this ticket) and then we could revisit the issue of &mut self receivers separately.

dsherret commented 1 year ago

@ebkalderon sorry for my delay responding here and thanks for the overview! I think you're right that having &mut self (especially the initialization methods) is probably ok. That said, I think it is nice for consumers to have control over this if they'd like to and not have the Send requirement.

In the meantime, I'd be happy to split out the work in https://github.com/ebkalderon/tower-lsp/issues/284 into two separate parts if that would help? That is, I could draft a release of tower-lsp which simply drops the Send bound on all futures while leaving all LanguageServer trait methods &self for now (addressing the direct concern of this ticket) and then we could revisit the issue of &mut self receivers separately.

That would be extremely helpful! It would be great for us to be able to drop the Send requirement because we will then be able to get rid of a lot of needless mutexes.

ebkalderon / tower-lsp