dense-analysis / ale

Check syntax in Vim/Neovim asynchronously and fix files, with Language Server Protocol (LSP) support
BSD 2-Clause "Simplified" License
13.48k stars 1.43k forks source link

[LSP Completion Performance] Incremental sychronization (aka byte-ranges) and debounce `textDocument/didChange` #4456

Open bluz71 opened 1 year ago

bluz71 commented 1 year ago

Hello ALE team,

This issue was invited by @w0rp in this Reddit post.

Firstly, I come from a position of ignorance about what strategy ALE currently uses with respect to LSP completion latency. If ALE already uses debouncing and byte-ranges, with respect to auto-completion, then please close this issue.

My history in the LSP space was firstly with the excellent LSC plugin and then to Neovim's native LSP client.

Initially Neovim's pure Lua LSP client performed worse than LSC (coded in Vimscript). Nate Bosch (LSC author) had implemented a very efficient client when it came to auto-completion performance; he debounced didChange events and in addition only sent byte-ranges to the Language Server. By contrast Neovim LSP (early on) did no debouncing and sent the full text buffer for each didChange event.

What is debouncing didChange events? Basically, when in insert mode (doing edits) don't send didChange until a pause longer than the debounce interval has occurred. In LSC's case that is 500ms as noted here. The Neovim LSP discussion around debouncing is noted here.

Closely related is the topic of byte-ranges; which are small snippets indicating the delta since the last didChange event. LSC has always used byte-ranges; Neovim LSP did not until I created this issue which then resulted in this PR. It's a complex problem, but one that LSC already solved. Initally the Neovim team just used that LSC algorithm (with appropriate attribution). ALE could do the same thing, if incremental synchronization/byte-ranges are not currently employed.

The combination of byte-ranges (incremental synchronization) with debouncing proved hugely beneficial to auto-completion performance (with certain slower Language Servers). LSC already implements solutions, in Vimscript, to these issues. LSC is permissively licensed; Neovim did leverage the LSC algorithm.

Feel free to contemplate these suggestions, or to close if not applicable.

Best regards.

w0rp commented 1 year ago

Ahh, I understand what you mean now!

An initial change just to debounce sending a change for a buffer is pretty easy to do. We can just stop-start a timer for change events and only send the latest change after a delay. The delay should be configurable. I'll have to play around with the default. I'm not sure if we should have the same 500ms default. I can get that done in hours.

Implementing some change where we only announce the exact range of bytes that have changed in a file I could do in... a few weeks of dedicated work? It's something I'm too lazy/busy to do myself. Others are welcome to submit a pull request for that.

w0rp commented 1 year ago

More specifically, the difficulty in sending only the range of bytes that have changed in a file is that you have to answer two difficult to answer questions.

  1. How do you know exactly what has changed in a Vim buffer? (You cannot be wrong by a single byte.)
  2. How do you know what you are sending to the server is still valid by the time you send it?

If you have exact answers to those two questions, you can implement it. Complications arise from:

  1. Unicode. Just Unicode.
  2. External programs outside of Vim entirely modifying files.
  3. Modifications to buffers done programmatically, and not by users.
  4. Modifications to buffers where :noautocmd is used, so you cannot even rely on :autocmd to check for every change.

Those complications led to me saying "I just won't even try" years ago, and sending the whole Vim buffer on each change. I think rather than trying to know every change that happens to a file, it would be easier to write an efficient algorithm that holds an entire copy of a buffer in memory and efficiently works out the difference in bytes and sends that to a server. That's my best idea for how you'd go about implementing it in Vim.

I will throw in that LSP is shifting away from UTF-16 a little bit: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

Prior to 3.17 the offsets were always based on a UTF-16 string representation. So in a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16. Since 3.17 clients and servers can agree on a different string encoding representation (e.g. UTF-8).

I hope all of this information might help someone implement a more efficient change event.

w0rp commented 1 year ago

I'm likely to do the "just debounce changes" part in the milestone, and move the "only send what has changed" part out of the milestone and into another issue.

bluz71 commented 1 year ago

Hello @w0rp,

Somewhere around 200-500ms is a reasonable auto-complete debounce default; along with an option for user override.

Byte-ranges is a more challenging undertaking. I was externally involved with-respect to the Neovim byte-range implementation. It took a couple iterations to get it right; it was very easy to have off-by-one bugs. In the end the Neovim developers just burrowed the vim-lsc implementation which was battle-tested and more importantly was correct. I was the one who advocated that, and the Neovim developers followed that suggestion (to good effect).

If ALE team is inclined to do byte-ranges/incremental-sync I would strongly encourage copying the vim-lsc implementation/algorithm rather than doing a scratch clean-room implementation. vim-lsc is permissively licensed; and Nate Bosch (vim-lsc author) approved Neovim copying his implementation. Client-side de-differencing is required; I am lead to believe that vim-lsc and vim-lsp ended up using roughly the same diff'ing engine; which is required for byte-ranges.

If someone is tasked with this and has questions then I am pretty sure @natebosch and @prabirshrestha would be willing to answer any such questions.

Cheers.

w0rp commented 1 year ago

The advice above is good. We can just copy the same exact lines of code that do it right and add a copyright attribution. Why implement something from scratch when it's free software?

vimpostor commented 1 year ago

I would strongly encourage copying the vim-lsc implementation/algorithm rather than doing a scratch clean-room implementation. vim-lsc is permissively licensed; and Nate Bosch (vim-lsc author) approved Neovim copying his implementation. Client-side de-differencing is required;

FYI there is a pending native "diff" pull request pending for vim written properly in C and capable of outputting everything that's needed for LSP here: https://github.com/vim/vim/pull/12321.

I would strongly encourage to use that when it is merged instead of the vim-lsc implementation.

w0rp commented 12 months ago

FYI there is a pending native "diff" pull request pending for vim written properly in C and capable of outputting everything that's needed for LSP here: vim/vim#12321.

I would strongly encourage to use that when it is merged instead of the vim-lsc implementation.

Now that looks particular attractive. I think we could also pull in the vim-lsc implementation with a copyright attribution notice in the file for it.

I'd love it if someone else who has had experience implementing this in the past could pick this up. If you throw in the algorithm with reasonable tests for it, I'll merge it.