ajaxorg / ace

Ace (Ajax.org Cloud9 Editor)
https://ace.c9.io
Other
26.74k stars 5.28k forks source link

Support for ALL Emoji #4142

Open JonLev opened 4 years ago

JonLev commented 4 years ago

We face the same issue as https://github.com/ajaxorg/ace/issues/3404 When editing Emoji in ACE the cursor seems to be in the wrong location (too far right) making editing very hard (the issue happen per line).

There is an example with last version: https://jsfiddle.net/9va6be3d/2/

If you go after the emoji, the gutter do not work correctly.

farshiana commented 4 years ago

Hi, I'm very fond of Ace editor but this is really blocking for us :/ Is there some kind of workaround?

nightwing commented 4 years ago

Hi, currently ace supports only fixed width characters, i'll try to fix this in the next release.

farshiana commented 4 years ago

Thanks. I am looking forward to it!

JeSuisUnCaillou commented 4 years ago

I work in a company where we use emojis a lot, in an online collaborative yaml editor I built on top of ace. The cursor shown on the wrong location after some emojis is causing us a lot of daily headaches :dizzy_face:

I love ace, it's the perfect solution for us, and this bug is the only thing holding us back. Looking forward to this issue being solved :fist:

JeSuisUnCaillou commented 4 years ago

@nightwing any news about this issue ? 🙏

Can you maybe point me to the part of the code where I could work a fix for myself ?

JeSuisUnCaillou commented 4 years ago

It looks like the problem is related to this issue :

UTF-16 surrogate pairs largely unsupported https://github.com/ajaxorg/ace/issues/1153

JeSuisUnCaillou commented 4 years ago

Ok, I have narrowed it down to these to cases :

Now, there was a pull request #2244 merged to manage emojis this january.

In the code added in this PR, lib/ace/selection.js has a condition offsetting the cursor when encountering a surrogate pair, which is not triggered by the emojis of one UTF-16 char because they have no surrogate.

As the emojis are larger even when made of only one UTF-16 char, the cursor appears to not be on the right spot.

EDIT:

JeSuisUnCaillou commented 4 years ago

I have worked this (dirty) workaround for myself, if anyone is interested : I reduce the size of single-char emojis to match the size of only one character https://github.com/JeSuisUnCaillou/ace/pull/2/files

And I've built it here : https://github.com/JeSuisUnCaillou/ace-builds/tree/fix/reduce_monochar_emoji_size

JonLev commented 4 years ago

@nightwing would the fix of @JeSuisUnCaillou would be usable ?

JeSuisUnCaillou commented 4 years ago

@JonLev The real solution should be to consider all emojis as a single character of width 2, whether they are one char or two chars with a surrogate. I also encountered some emojis composed of 2 emojis (1 or 2 char each) with a zero-width joiner in between, like this one, which should also be considered as one character.

My fix just tries to avoid the cursor offset (which makes the editor very hard to use), but I don't think it's a reasonable solution to the emoji problem.

I didn't dive deep enough to understand all the code needed to implement the complete solution.

JeSuisUnCaillou commented 4 years ago

I'm discovering more corner cases regularly.

Today, I learned that an emoji can be followed by a character called VARIATION SELECTOR-16, which is just here to say that the previous character must be displayed as an emoji (for emojis that also have a "normal" display, like this one : 🕵)

One day, I will make an exhaustive list of all the weird cases of emojis. And maybe with time and iterations, I'll try to implement it properly, who knows ?

kkucharc commented 4 years ago

Hi! Any news about in this issue?

tgross35 commented 2 years ago

Adding in - this isn't just emojis, but also special characters such as .

The issue is likely relevant to all UTF8 characters >1 byte (can be 1-4 bytes).

Context for anyone looking (@JeSuisUnCaillou) you want to look into splitting the strings by UTF-8 graphemes rather than literal 8-bit chars. This gives characters as we know and see them, rather than a "char" as a computer sees it.

Rules for this follow unicode segmentation, found here https://unicode.org/reports/tr29/. There should be libraries to do this in JS.

I'm not a frontend dev, but for example, see the rust library for it https://docs.rs/unicode-segmentation/latest/unicode_segmentation/. Splitting by graphemes gives you what you expect - ["a̐", "é", "ö̲", "\r\n"] but running the same thing with a normal char split gives ['a', '\u{310}', 'e', '\u{301}', 'o', '\u{308}', '\u{332}', '\r', '\n']

andrewnester commented 2 years ago

We have a tracking issue for this problem here: https://github.com/ajaxorg/ace/issues/460

RomanShemelin commented 1 year ago

Hi! Any news about in this issue?

petersolopov commented 1 year ago

We're facing a similar problem and are looking forward to a resolution. Thanks a lot