cursorless-dev / cursorless

Don't let the cursor slow you down
https://www.cursorless.org/
MIT License
1.09k stars 77 forks source link

neovim: better support utf8 #2377

Open saidelike opened 1 month ago

saidelike commented 1 month ago

we need better selection range when dealing with utf8 contents

Atm when we read lines using the node-client API (buffer.getLines()). we get the utf-8 decoded data, which is nice. So typically we will get less characters than the actual bytes representing it. Then cursorless works on that data to modify it so typically if we chuck/change/etc. we would work on the decoded data.

However, atm when we want to get/set the selection, typically for a take action, we use the lua API (require("talon.cursorless").buffer_get_selection() or return require("talon.cursorless").select_range()) which typically has no clue of the utf-8 encoding. And so we will just use the number of decoded data which is the only one we know atm, instead of the encoded utf-8 bytes, and so the selection range will typically be less than the actual correct one.

We could possibly special case using https://github.com/uga-rosa/utf8.nvim/blob/main/doc/utf8.txt but it seems a bit annoying. We have to use an external lib because they don't expose utf8 support in neovim apis atm https://github.com/neovim/neovim/issues/14281

It would be based on detecting that utf8 is used with set fileencoding https://superuser.com/questions/28779/how-do-i-find-the-encoding-of-the-current-buffer-in-vim

Pokey mentioned: I think we could use https://neovim.io/doc/user/lua.html#vim.str_byteindex(), no? I believe the stuff still unresolved in https://github.com/neovim/neovim/issues/14281 is just grapheme cluster stuff, which we don't need

note to myself: no utf8 support in cursorless (strings that under the hood are utf16) => we need to decode it before giving it to cursorless