we need better selection range when dealing with utf8 contents
Atm when we read lines using the node-client API (buffer.getLines()). we get the utf-8 decoded data, which is nice. So typically we will get less characters than the actual bytes representing it. Then cursorless works on that data to modify it so typically if we chuck/change/etc. we would work on the decoded data.
However, atm when we want to get/set the selection, typically for a take action, we use the lua API (require("talon.cursorless").buffer_get_selection() or return require("talon.cursorless").select_range()) which typically has no clue of the utf-8 encoding. And so we will just use the number of decoded data which is the only one we know atm, instead of the encoded utf-8 bytes, and so the selection range will typically be less than the actual correct one.
we need better selection range when dealing with utf8 contents
Atm when we read lines using the node-client API (
buffer.getLines()
). we get the utf-8 decoded data, which is nice. So typically we will get less characters than the actual bytes representing it. Then cursorless works on that data to modify it so typically if we chuck/change/etc. we would work on the decoded data.However, atm when we want to get/set the selection, typically for a
take
action, we use the lua API (require("talon.cursorless").buffer_get_selection()
orreturn require("talon.cursorless").select_range()
) which typically has no clue of the utf-8 encoding. And so we will just use the number of decoded data which is the only one we know atm, instead of the encoded utf-8 bytes, and so the selection range will typically be less than the actual correct one.We could possibly special case using https://github.com/uga-rosa/utf8.nvim/blob/main/doc/utf8.txt but it seems a bit annoying. We have to use an external lib because they don't expose utf8 support in neovim apis atm https://github.com/neovim/neovim/issues/14281
It would be based on detecting that utf8 is used with set fileencoding https://superuser.com/questions/28779/how-do-i-find-the-encoding-of-the-current-buffer-in-vim
Pokey mentioned: I think we could use https://neovim.io/doc/user/lua.html#vim.str_byteindex(), no? I believe the stuff still unresolved in https://github.com/neovim/neovim/issues/14281 is just grapheme cluster stuff, which we don't need
note to myself: no utf8 support in cursorless (strings that under the hood are utf16) => we need to decode it before giving it to cursorless