Open santhoshtr opened 10 years ago
This is a case of CodeMirror's simplistic grapheme cluster algorithm not handling the language. Unfortunately, JavaScript does not provide the primitives needed to do sane cluster-boundary detection (finding character properties, etc).
Happens with all non-latin complex scripts
Not all. Some, like Arabic, should work.
This is a case of CodeMirror's simplistic grapheme cluster algorithm not handling the language. Unfortunately, JavaScript does not provide the primitives needed to do sane cluster-boundary detection (finding character properties, etc).
I would like to understand it a bit more. What exact algorithm you need to place the cursor at a logically correct position? If we want to support a lot of languages, we should leave this kind of primitive functionality to browsers. Trying to imitate such behavior will reach no where.
Also. how Chrome gives different output than Firefox in this case?
To know how to move the cursor through a text, and which ranges of codepoints to use when measuring character positions, CodeMirror needs to know where clusters start and end.
The browser knows this, but doesn't expose this information to JavaScript. Telling me that what I'm doing "will reach no where" without actually understanding the problem isn't really the right tone to take here.
I have faced the cursor movement, logical cluster issues in the development of Visual Editor for Wikimedia. Thought of understanding the problem in detail so that I might be able to help. Will check later, don't have time to find out the details now. Thanks.
Attached patch fixes some known problems with handling of extending code points, and appears to help with #2125 (Hindi), but does not fix your example.
I will need some input from someone who is familiar with this language's Unicode encoding, because the behavior of this string baffles me. Characters "ന്തോ" act as a single unit, as far as cursor movement is concerned, but only the second code point in that string is an extending character. If I read the document at http://www.unicode.org/reports/tr29/ correctly, this should count as three grapheme clusters, not one. What is going on?
CC'ing @pauldhunt and @miguelsousa, who have worked on some of Adobe's open-source typography efforts -- just in case they have any quick insights to share :-)
I have removed my previous comment.
This language is Malayalam. Fix for #2125 is not fixing positioning for this language.
Characters "ന്തോ" act as a single unit, as far as cursor movement is concerned, but only the second code point in that string is an extending character. If I read the document at http://www.unicode.org/reports/tr29/ correctly, this should count as three grapheme clusters, not one. What is going on?
You cannot rely on TR29 for getting grapheme clusters for the purpose of the counting or cursor movement. TR29 clearly explains this. You have to use tailored logic to meet your purpose. That too is not enough since in Indic scripts, depdending on the font, multiple consonants with the help of a joining character like VIRAMA can create single ligatures. Sometime stacking of characters happens. Chrome and FF does not agree on the implementation of character movement on Indic scripts. Chrome allows you to move your cursor as per logical boundaries. FF also follow the same rule, but FF allows placing cursor if you try to do it using a program. You have to ask the browser whether you can place a cursor here or not. Iterating that question over a range of text will give you a reliable cursor placing positions. This can be used for creating a stack of edits useful for undo redo etc.
By 'ask the browser' you mean create a textarea and try to set the cursor in the textarea there? Or is there a more efficient/convenient way to do it on (non-editable) DOM nodes?
Is there an easy/cheap way to determine whether a string might have stacking?
By 'ask the browser' you mean create a textarea and try to set the cursor in the textarea there? Or is there a more efficient/convenient way to do it on (non-editable) DOM nodes?
Yes, create an editable node and keep on trying to place cursor. Of course it is inefficient and hacky.
Is there an easy/cheap way to determine whether a string might have stacking?
No, that is not possible. It not only depends on the data but also the font used.
Is there an easy/cheap way to determine whether a string might have stacking?
No, that is not possible. It not only depends on the data but also the font used.
Well, I meant a way to weed out strings that obviously don't need the expensive treatment, and simply have a cursor position between every code point. /[^\x00-\x7f]/
would work to spot ascii strings, but maybe we can do better, and enumerate the ranges of the languages in this occurs (by using broad ranges to keep the string size under control, false positives aren't bad).
@santhoshtr
Yes, create an editable node and keep on trying to place cursor. Of course it is inefficient and hacky.
On Firefox, it seems that selectionEnd
can be set to any value, even one that's not a valid cursor position. Do you have any example of this technique actually being applied?
(That is, I'm using a textarea now, because there i can play with selectionEnd
without actually breaking the existing selection in the document. Using getSelection().addRange()
is just too horribly disruptive—will cause tons of side effects on mobile, and also cause spurious deselects/reselects on desktop.)
@marijnh Arabic doesn't work correctly same as Thai.
@marijnh https://github.com/marijnh/CodeMirror/issues/2115#issuecomment-31731752
The browser knows this, but doesn't expose this information to JavaScript.
Have you considered filing a bug for this at https://bugzilla.mozilla.org/ or https://code.google.com/p/chromium/
Wondering if there is any update or workaround to this bug yet?
Nope, I still haven't found a hack that works halfway acceptably.
I still have same issue, if you set a custom font, like Inconsolata, the line height or cursor positioning is way off (until you start to make some interaction/typing/clicking in the textarea rendered into .CodeMirror class.
Can't make RTL for arabic?
This is a issue that's difficult if not impossible to solve with the fundamental approach currently taken by CodeMirror.
We are working on a rewrite (CodeMirror 6) that might address this issue, and we are currently raising money for this work: See the announcement for more information about the rewrite and a demo.
Note that CodeMirror 6 is by no means stable or usable in production, yet. It's highly unlikely that we pick up this issue for CodeMirror 5, though.
Same issue here, the cursor seem to be completely mispositioned... I have used codeMirror.getDoc().setValue()
though.
Windows 10 1909 Chrome 86.0.4240.111
Paste the following text to brackets, and see where the cursor is placed
സന്തോഷ്
Cursor is supposed to place at end of the word, but in brackets it is after 4 or 5 character width.
Happens with all non-latin complex scripts
Works fine in Firefox, but issue exist in chrome.
(duplicated from https://github.com/adobe/brackets/issues/6301)