No unicode completion after a mathcal character

ejgallego / coq-lsp

Visual Studio Code Extension and Language Server Protocol for Coq

GNU Lesser General Public License v2.1

152 stars 31 forks source link

No unicode completion after a mathcal character #531

Closed ineol closed 11 months ago

ineol commented 1 year ago

Describe the bug

The \ character does not trigger a unicode completion if there there is a mathcal character on the same line before it.

To Reproduce Steps to reproduce the behavior:

Open a coq file
type 𝒰 using \calU
type on the same line
type \
The unicode menu does not open

Expected behavior The unicode completion menu opens

Desktop (please complete the following information):

Operating system linux
coq-lsp version 0.1.7 (also 0.1.6)
VS Code version 1.80.1 (but also older versions from the spring)

ejgallego commented 1 year ago

Hi @ineol , thanks for the careful bug report. I can reproduce, will look into it.

ejgallego commented 1 year ago

The problem seems to be in the VS Code side, for 𝒰 it sends the wrong character information in the completion request, see this example:

𝒰\

Pressing Ctrl-Space we get:

[Trace - 3:50:31] Sending request 'textDocument/completion - (73)'.
Params: {
    "textDocument": {
        "uri": "file:///home/egallego/research/coq-lsp/examples/Ltac2_plugin.v"
    },
    "position": {
        "line": 8,
        "character": 3
    },
    "context": {
        "triggerKind": 1
    }
}

character should be 2. Hence the server is confused.

ineol commented 1 year ago

It seems to happen for any character that takes two utf16 code points.

I believe that LSP positions are expressed in UTF16 code point offsets by default, and servers can ask to get UTF8 code points instead. So that seems consistent with position = 3 if this corresponds to the position after the /.

ejgallego commented 1 year ago

Oh you are right I think, thanks!

I hit this issue back in 2017/2018 with LSP, their choice of character / code point encoding is quite questionable to many as most servers use UTF-8 for their document encoding.

Fortunately I think we got all the right infra here to fix this problem, but I had totally forgotten about the UTF-16 problem.

I'll have it a go to fix this.

cc: https://github.com/microsoft/language-server-protocol/issues/376 and https://github.com/microsoft/language-server-protocol/issues/872