bobbylight / RSyntaxTextArea

A syntax highlighting, code folding text editor for Java Swing applications.
BSD 3-Clause "New" or "Revised" License
1.11k stars 258 forks source link

Problematic edit of diacritics that are in form of decomposed characters #506

Open pskowronek opened 1 year ago

pskowronek commented 1 year ago

Description Problematic edit of diacritics that are in form of decomposed characters (see here). So, diacritics can be represented as composed character from UTF-8 or in decomposed form: letter + accent. The first type works good, no problem, whereas the second one has problem while editing such character - if you place cursor after such character or a little bit in the middle and hit Tab or Space, then the character is being split into two - a letter and its accent.

Those decomposed characters are being used by the newest macOS versions for filenames on file system (and on older macOS on certain volumes, like those mounted from sparse bundles). I guess, they can also be used when people have standard keyboard layouts and they want to type diacritics - probably there's a way to type a letter plus add accent to it.

Steps to Reproduce Specific steps to reproduce the behavior:

  1. Replace sample file RSyntaxTextAreaDemo/src/main/resources/org/fife/ui/rsyntaxtextarea/demo/JavaExample.txt with the one that contains decomposed characters, like this (unzip it first): javaexampletxt.zip
  2. Run demo app ./gradlew run
  3. Diacritics are shown properly
  4. Try to place cursor after diacritic or in the middle - hit Tab or space

Expected behavior

The word is split properly and diacritic character stays intact

Actual behavior

The diacritic character is split into two characters - a letter and its accent

Screenshots Initially presented OK: rsta-show

After hitting Tab near diacritics: rsta-edit

Java version Used Java 14 since this project has older Gradle, but tried with muCommander that is using RSyntaxTextArea and java 20 - the problem is still there.

macOS version 10.15.7

Additional context btw, I still can see such behavior in the newest IntellijIDEA v2023.1.2 (that is also using Java/Swing). More details can be found here: https://github.com/mucommander/mucommander/issues/941

bobbylight commented 1 year ago

I might need some help on this one, as I don't know much about this topic, but I'm happy to take a look!

pskowronek commented 1 year ago

@bobbylight I can try to assist, however, how technically it should be solved I don't have idea. Please see the referenced bug in muCommander I gathered some links (especially https://github.com/mucommander/mucommander/issues/941#issuecomment-1555274496). However, in Intellij IDEA a similar bug is still there.

One idea was to check if cursor is in the middle of a character that it is a composite - but how to accurately tell if it is the case? Probably by checking if the next character is 'accent', and if so, then the character before the cursor and check via java if it is composite (I think java 20 can tell that, don't have api at hand tho). The next question is - how many characters before should be checked.

Good start is here