atom-archive / xray

An experimental next-generation Electron-based text editor
MIT License
8.48k stars 235 forks source link

Inserting a two byte UTF-8 character moves the cursor two characters #113

Closed MoritzKn closed 6 years ago

MoritzKn commented 6 years ago

Inserting a two byte UTF-8 character like ä moves the cursor two characters.

Given a buffer like this with the pipe being the cursor:

|1234

Inserting ä will result in the cursor being placed like this:

ä1|234

I've tried to fix it myself but couldn't find the right place in the code. With a few hints I may get further... how is the cursor represented in code? where are insertions handled?

nathansobo commented 6 years ago

Interesting. A decent place to start might be drawCursors in text_plane.js. See if we're relaying the correct column but measuring it wrong or if the column is off. We currently represent text as UTF-16 internally (something we should eventually fix) so it's a bit surprising that a 2-byte character is causing issues.

MoritzKn commented 6 years ago

See if we're relaying the correct column but measuring it wrong or if the column is off.

The next character is inserted in the position the cursor is displayed. So I assume it has to do with the internal logic rather than the rendering.

We currently represent text as UTF-16 internally

Maybe the problem is somewhere else then. Perhaps when converting the input event into a change to the buffer.

And thats where I think I located the issue. The edit() function in buffer_view.rs still uses a rust string slice (&str) which is AFSIK utf8. In the function text.len() is used to get the length of the string. len() of a string however returns the the byte length. I'm going to try and fix this.