Open vitvakatu opened 3 months ago
I updated the description (better to have tickets self-contained).
The old key emoji ๐๏ธ \uD83D\uDDDD
posted here has a variation selector \uFE0F
telling how to render the value. Java treats string \uD83D\uDDDD\uFE0F
as two code points: \uD83D\uDDDD
and \uFE0F
. If you try to make a text edit and add foo
after the key (at the Position(0,1)
), the result will be the uD83D\uDDDDfoo\uFE0F
.
The fix would be to strip the user input from variation selectors. Although it won't be the exact byte sequence, it will be visually the same.
I'll see if I can use the icu UTF16
to detect the emoji position properly.
(Note to self) Check how JS treats the \uD83D\uDDDD\uFE0F
string.
Dmitry Bushev reports a new STANDUP for yesterday (2024-08-01):
Progress: [10678] Started working on the issue. Managed to find the case when an emoji with a modifier is treated by two symbols in Java string. Created the test case reproducing the issue. It should be finished by 2024-08-07.
Next Day: Next day I will be working on the #10678 task. Continue working on the task
Dmitry Bushev reports a new STANDUP for today (2024-08-02):
Progress: [10678] Playing with ICU library trying to see if it can detect the position of emoji with modifier correctly. [10735] Updated the SBT build to accommodate the changed npm configuration. Fixed the ydoc-server-polyglot
esbuild. It should be finished by 2024-08-07.
Next Day: Next day I will be working on the #10678 task. Continue working on the task
Dmitry Bushev reports a new STANDUP for yesterday (2024-08-05):
Progress: [10678] Playing with ICU iteration capabilities to detect emojis correctly. Implemented draft version of iterator capable of iterating emojis. Started testing It should be finished by 2024-08-07.
Next Day: Next day I will be working on the #10678 task. Continue working on the task
Dmitry Bushev reports a new STANDUP for today (2024-08-06):
Progress: [10678] Looking into the string implementation in JS. Negotiated with the gui team to work with the code units and not with the code points. Started updating the text editing logit to support code units. It should be finished by 2024-08-07.
Next Day: Next day I will be working on the #10678 task. Continue working on the task
Dmitry Bushev reports a new STANDUP for yesterday (2024-08-07):
Progress: [10678] Implemented text editor support with ranges measured in the Unicode code units. Updated tests. Created the PR. It should be finished by 2024-08-07.
Next Day: Next day I will be working on the #10678 task. Continue working on the task
The issue initially found when observing documentation panel bug, but it also happens in regular code editor. It seems to be caused by incorrect handling of Unicode inside of engine. Example of usage in our tests:
My idea is to add an emoji https://unicodeplus.com/U+1F5DD to the code. I want to insert some text directly after this emoji. It has size of 2 UTF-16 codeunits (2x2 bytes). I changed offset as weโre interpreting this in the GUI: as emoji takes 2 codeunits, and I replaced a single space character, I shifted edit one codeunit (13 instead of 12). However, it seems the engine code interprets this emoji as a single codeunit, so the text get inserted one character after, on an incorrect index. So it seems to me engine code works on Unicode UTF-16 code points, not code units.
Internal discussion aviable at https://discord.com/channels/401396655599124480/1266028137175584768/1266028139037982720