0x7c13 / Notepads

A modern, lightweight text editor with a minimalist design.
https://www.NotepadsApp.com
MIT License
8.81k stars 482 forks source link

[Bug] Pasting Arabic text may make it left-to-right #397

Closed be5invis closed 4 years ago

be5invis commented 4 years ago

Describe the bug Right-to-Left scripts (Arabic, Hebrew, etc.) are not treated as RTL.

To Reproduce Type this string into Notepads:

العربية

This is the word "Arabic" in Arabic, contains 7 letters:

Expected behavior This string is shown like this: image It arranges the letters from right to left, and shows:

Screenshots image It could be observed that this word is treated as left-to-right and then shaped together. Currently it is shown like this, which is wrong:

Desktop (please complete the following information):

Additional context Please note that BiDi is not only supporting text that flows from right to left, but also the mixture of LTR and RTL scripts, and various BiDi control characters. Please refer to here for more detail: https://en.wikipedia.org/wiki/Bidirectional_text

be5invis commented 4 years ago

Note: Pressing CTRL+R makes the editor look like this: image

0x7c13 commented 4 years ago

I did some update to the flow direction change shortcut yesterday and it should be available in v1.1.8.0.

Basically you can use Ctrl+L or Ctrl+R to quickly swtich text flow direction (this apply to full document). You need to retype after changing text flow direction. It might not work for existing words.

Btw, why your system language is Chinese?

image

Also I am working on the PR to enable flow direction change option in right click context menu but that option should only be enabled when user's locale is in RTL mode.

be5invis commented 4 years ago

The problem is that the direction of the letters in the Arabic word is wrong.

No matter what what the paragraph direction is, Arabic letters should always be handled right-to-left. Paragraph direction will influence the order of many things, like the order of words or numbers, but not the order of letters within an Arabic word.

The following string is the sample of the Unicode Bidi Algorithm demo:

mark 3.1% مارْك 2.0.

When shown in a LTR paragraph:

image

When shown in a RTL paragraph: image

0x7c13 commented 4 years ago

https://github.com/JasonStein/Notepads/commit/e0e470427ceaea081d67a212a219e61166c93861

Can you try latest master build?

image

image

be5invis commented 4 years ago

How does this mark 3.1% مارْك 2.0. look in a LTR paragraph?

0x7c13 commented 4 years ago

image image image image

be5invis commented 4 years ago

Seems fine, let me grab a private build and test more strings 🤔.

be5invis commented 4 years ago

Strange, in Master it looks like this: image

be5invis commented 4 years ago

It looks like if I paste the string when having CHS IME enabled, then the BiDi result is incorrect. If using English keyboard, it is correct. image

be5invis commented 4 years ago

The pasting behavior looks very unstable, but enabling IME will always ruin RTL runs...

0x7c13 commented 4 years ago

It is definitely doing something under the hood and it is biased towards local + IME as well. Another thing I found is that manual changing the direction when there is nothing will not work. However, the default RichEditBox shortcut (Ctrl+L/R) works without any problem.

0x7c13 commented 4 years ago

This PR should properly handles all cases: https://github.com/JasonStein/Notepads/pull/395/files

image image

be5invis commented 4 years ago

Slightly modifying PastePlainTextFromWindowsClipboard could fix the IME-paste issue:

Document.BeginUndoGroup();
Document.Selection.SetText(TextSetOptions.None, text);
Document.Selection.CharacterFormat.TextScript = TextScript.Ansi;
Document.Selection.StartPosition = Document.Selection.EndPosition;
Document.EndUndoGroup();
be5invis commented 4 years ago

Confirm fixed. Closed.

be5invis commented 4 years ago

image

Test String:

Adlam:  𞤑𞤵𞥅𞤤𞤢𞤤 𞤺𞤢𞤣𞤢𞤲𞤢𞤤 𞤋𞤲𞥆𞤢𞤥𞤢 𞤢𞥄𞤣𞤫𞥅𞤶𞤭 𞤬𞤮𞤬 𞤨𞤮𞤼𞤭⹁ 𞤲'𞤣𞤭𞤥𞤯𞤭𞤣𞤭 𞤫 𞤶𞤭𞤦𞤭𞤲𞤢𞤲𞥆𞤣𞤫 𞤼𞤮 𞤦𞤢𞤲𞥆𞤺𞤫 𞤸𞤢𞤳𞥆𞤫𞥅𞤶𞤭.
Arabic: عندما يريد العالم أن ‪يتكلّم ‬ ، فهو يتحدّث بلغة يونيكود. تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود (Unicode Conference)، الذي سيعقد في 10-12 آذار 1997 بمدينة مَايِنْتْس، ألمانيا.
Hebrew: סעיף א. כל בני אדם נולדו בני חורין ושווים בערכם ובזכויותיהם. כולם חוננו בתבונה ובמצפון, לפיכך חובה עליהם לנהוג איש ברעהו ברוח של אחוה.
N'ko:       ߞߏ ߡߍ߲ ߞߵߊ߬ ߞߍ߫ ߊ߲ ߛߋ߫ ߘߊ߫ ߞߊ߬ ߕߟߋ߬ߓߊ߰ߓߟߐߟߐ ߘߊߦߟߍ߬ ߒߞߏ ߦߋ߫ ߸ 
Syriac: ܫܠܡ ܘܠܐܠܗܐ ܏ܫܘܒ܆ ܘܠܕܘܝܐ ܕܣܡ ܗܠܝܢ ܫܘܒܩܢܐ܀
Thaana: ވަނަ މާއްދާ ހުރިހާ އިންސާނުންވެސް ދުނިޔެއަށް އުފަންވަނީ، މިނިވަންކަމުގައި، ހަމަހަމަ ޙައްޤުތަކަކާއެކު، ހަމަހަމަ ދަރަޖައެއްގައި ކަމޭހިތެވިގެންވާ ބައެއްގެ ގޮތުގައެވެ.