Problems with selection and cursor position in complex non-latin scripts

clauseggers commented 10 years ago

Atom has a problem in dealing with Devanagari, in that the cursor position does not always reflect the actual position if there is some Devanagari text in the current line. Same goes for selecting within lines containing Devanagari.

I can also see that Atom renders Devangari as fully formed Devangari with half-forms, conjuncts (ligatures), and so on. For an editor I would much prefer the Sublime Text approach, where Devanagari text is treated as a string of characters with no attempt to render half-forms, conjuncts, and do the other advanced OpenType compositing features. Instead the text is rendered the way it is typed, which is much much easier to edit.

Here is some sample text that borks Atom:

<span class="Glyphs">NO LANG ‘रृ’ “रॄ” ल- लृ–!!!???,.</span></span></br>

(Try here to select the ??? and delete them…)

<span class="ModelNorthern">रृ रॄ ल लृ लॄ ऌ ॡ </span></b>

(Try here to select the Devanagari characters and delete them…)

Also try to paste those strings into Sublime Text and see how ST renders the Devanagari as full-forms + halants/virama + matras in sequential order, which I would prefer.

Because the monospaced fonts we use often do not have Devanagari, the Devanagari is rendered in a proportional fall-back font. If you did as in Sublime Text you could force the advance-width of all the Devanagari glyphs to be the same as the default monospaced font. That way you could preserve monospaced text and get around the whole – very thorny – issue of mixing in complex non-latin scripts in a monospaced environment.

Just an idea, so up to you to find out if that would actually be a way to go.

I note that you have partially fixed the issue in 0.140, but it’s still not perfect. Take this example line:

<span style="word-wrap: break-word">क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ऱ ल ळ व श ष स ह क़ ख़ ग़ ज़ ड़ ढ़ फ़ य़ ॹ ॻ ॼ ॾ ॿ</span><br><br>

If you try to select it with the mouse, from the end towards the beginning of the line, then the last br> is not selected.

If you select the Devanagari text and press delete, then you end up with this:

<span style="word-wrap: break-word">pan><br><br>

(notice the missing </s in the closing span tag).

Atom Version: 0.140.0
OS Version: Mac OS X 10.10
Misc Versions
- apm 0.108.0
- npm 1.4.4
- node 0.10.32
- python 2.7.6
- git 1.9.3

kevinsawicki commented 10 years ago

@clauseggers I'm seeing how Sublime renders them differently but it looks like Chromium renders them as one character instead of two, do you use Chrome and is there some setting you know of to control this?

kevinsawicki commented 10 years ago

And just to confirm, you'd prefer it be displayed like this:

screen shot 2014-10-31 at 1 08 41 pm

versus this:

screen shot 2014-10-31 at 1 09 44 pm

clauseggers commented 10 years ago

Hi Kevin. Sublime Text renders every character as one character, instead of using the OpenType ‘dev2’ or older ‘deva’ OpenType shaping engine to render a text-run. To get to the ST way of doing things you would have to disable this shaping engine. How one would go about doing that in Chrome I have absolutely no idea about. I don’t think using eg. a Zero Width Joiner would after each character would be a good idea, that would depend too much on the particular font used. You would have to go deeper.

The normal OpenType shaping of Devanagari is pretty complex. Here’s an overview: https://www.microsoft.com/typography/OpenTypeDev/devanagari/intro.htm You could also read the Unicode 7 standard, chapter 12.

Regarding your question on how Devanagari should be rendered in a monospaced environment, please refer to what I wrote in the bug-report. I’m proposing that Devanagari might have to be rendered as sequential characters with no shaping, to conform to the monospaced fonts used in these kinds of editors. However, that approach renders the text un-readable (same goes for Arabic and other complex scripts)! I would suggest a bit of research into how native speakers prefer it, vis a vis established conventions.

Good luck!

kevinsawicki commented 10 years ago

However, that approach renders the text un-readable

So when you are editing Devanagari text in Sublime, it is unreadable?

clauseggers commented 10 years ago

Essentially yes. A reader can spell their way through the text, but it requires some mental gymnastics. I can live with that because I can see the individual characters sequentially, and that happen to suit me fine, and my thesis is that it might suit other users fine, given that the kind of documents/files one works on in a code editor are very crucial to get perfect down to the character level. Remember that if you render Devanagari correctly, then you will not necessarily be able to see all the characters in a text-run.

kevinsawicki commented 10 years ago

This looks similar to #1849 in terms of the cursor position issues.

clauseggers commented 10 years ago

Yes, exactly the same, though Arabic and Hebrew adds the complexity of right-to-left text mixed in with left-to-right text.

jtauber commented 9 years ago

As I mention in #1849, this problem is easy to reproduce with polytonic Greek too. Interestingly multi-byte characters later on in the line can change characters placement (and mess up cursor placement) earlier on.

dwhieb commented 9 years ago

Sounds like this is the same problem I'm having. I'm a linguist doing computational work so I have to type a lot of special characters. I have special keyboard software installed (Keyman: http://keyman.com/desktop/), along with a keyboard for typing characters in the International Phonetic Alphabet (SIL IPA Unicode: http://www.tavultesoft.com/ipa/?search=ipa), that maps certain characters or diacritics to certain key combinations. But this same problem happens with any keyboard I try. The issue seems to be what happens when the characters are combined in Atom.

When I use this in Atom, I get the same problem with cursor location described above. I also get a problem with characters simply disappearing. Usually the keyboard works like this:

Typing e@3 outputs é in two steps (typing e then @ produces e̊, and then following up with 3 produces é)

What happens in Atom is this:

Typing e@3 outputs ́, a lone accent, and deletes the character that came before it. Typing e then @ produces e̊ like normal, but then typing 3 deletes the e and leaves me with just the accent.

Typing ee@3 produces é, the correct output, but requires me to type an extra letter for every diacritic I want to type.

Hope that's helpful. LOVE v1.0ǃ

winstliu commented 9 years ago

@as-cii Can this be closed?

as-cii commented 9 years ago

Yup, thanks!

jcuenod commented 9 years ago

Hi, I'm wondering whether this is still the same issue:

Try this Hebrew (rtl)

כָּתֵף

Starting from the extreme left, I would expect that a right arrow would either move the cursor to the next visual position (that is, between the ת and the ף) or jump to what is the next logical position (between the כ and the ת). Instead Atom doesn't move the cursor at all but it has functionally moved between the כ and the dagesh (middle dot), similarly if I hit the right arrow again, the cursor stays at point 0 but logically it is now between the qamets (the vowel point beneath the כ) and the dagesh.

This gets interesting when you hit the right arrow a third time. The cursor now moves between the ת and the ף but logically it is actually between the כ and the ת and if you hit backspace, it will remove the qamets).

Is this is the same issue or should I open a new one? Atom v1.1.0, Fedora 22

as-cii commented 9 years ago

@jcuenod: thanks for reporting this problem :sparkles:, I am experiencing the same exact behavior. Could you confirm this issue existed even in earlier releases of Atom? I think we should open another issue for this, as after a quick look it would seem like we're failing to recognize some paired characters (i.e. it doesn't seem to be rendering-related).

jcuenod commented 9 years ago

In earlier versions of atom, combining characters didn't display in the same way as they do now so the problem would have manifested quite differently. To be honest I gave up using complex text in Atom and just retried it because I saw a number of these issues had been closed (and it is a lot better).

as-cii commented 9 years ago

and it is a lot better

Thanks, @jcuenod! :heart: Could you please create an issue anyways, so that we can keep track of this new issue? I feel like we'll end up fully supporting complex text sooner or later and having a good bug report will help speeding the resolution up a lot. If you can't, thank you anyways for reporting this problem! :pray:

lock[bot] commented 5 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. If you can still reproduce this issue in Safe Mode then please open a new issue and fill out the entire issue template to ensure that we have enough information to address your issue. Thanks!

atom / atom

Problems with selection and cursor position in complex non-latin scripts #4007