eclipse-platform / eclipse.platform.swt

Eclipse SWT
https://www.eclipse.org/swt/
Eclipse Public License 2.0
114 stars 132 forks source link

indic standard digits not visible in views (editor, console) #950

Open goyalyashpal opened 9 months ago

goyalyashpal commented 9 months ago

Let's make sure issue is not already fixed in latest builds first.

Steps to reproduce

Main:

  1. Open any file containing digits in eclipse ide
  2. Change-Standard-Digits (mentioned below)
  3. Force rendering of fonts in eclipse by:
    • activating the eclipse window
    • move the text cursor up/down in editor on line containing digits
    • hover over the console line containing digits
  4. repeat step 2-3

Change-Standard-Digits (only step 3 has to be repeated):

  1. Navigate to: Control Panel\All Control Panel Items\Region i.e. Copy above to clipboard > Win-e (open explorer) > A-d (go to address bar) > C-v (paste) > RET (press enter)
  2. A-d (additional settings) > A-u (use native digits): set to National > A-a (apply)
  3. A-s (standard digits) > arrow keys to change > A-a (apply)
  4. Do not close this panel just yet

I expected: normal file is shown

But got: digits in editor text don't appear when "standard-digits" belong to indic languages

Screen Captures

play at 2x speed:

https://github.com/eclipse-platform/.github/assets/19423063/70ac4e15-43f6-458b-800e-913957dc84dc

Tested under this environment:

Community

P.S.

duplicates searched at: digits visible | digits disappear

goyalyashpal commented 9 months ago

oh i forgot to add that i ried changing fonts in eclipse settings etc, but it seems to not affect much. update: done ~; will share screetshot later~

mihnita commented 8 months ago

That setting in Windows is in fact a "hack", in my opinion.

It affects the rendering of the (ASCII) digits 0-9. And since that is rendering only, it means that to see the change the text needs a redraw. So "Force rendering of fonts in eclipse"... part makes sense.

The digits in the clipboard are still ASCII.

On the other side, if the document really contains Indic digits (non-ASCII), the "hack" does not kick in. So if my document contains real Burmese digits (၀၁၂၃၄၅၆၇၈၉), the settings in Control Panel have no effect.

The "Use native digits" option is a switch that tells the system how "Standard digits" option above works. "Never" means that option is ignored (should be grayed out, if you ask me, but it is not) "Context" looks at what kind of text is before / after the digits and honor "Standard digits" depending on that. "National" means always use the digits in "Standard digits". So if "Standard digits" is set to "0123456789" then "Use native digits"="National" does not actually mean "national", means use the ASCII digits.

I am not sure what the expectation reported in this bug is.

But the behavior is the same as for most other Windows applications. Confusing, but "native platform behavior"

The only way to bypass that (that I know of) is to use the low-level Uniscribe APIs. Which means using custom widgets (buttons, labels, etc) instead of the system ones.

TLDR: I think this works as (Microsoft) intended.

goyalyashpal commented 8 months ago

I am not sure what the expectation reported in this bug is.

expectation is for the digits to APPEAR - be it in indo-arabic or in the selected digits (here, indic).

@mihnita did u even read the description or watch the attached video?

I expected: normal file is shown

But got: digits in editor text don't appear when "standard-digits" belong to indic languages

goyalyashpal commented 8 months ago

But the behavior is the same as for most other Windows applications. Confusing, but "native platform behavior"

no its not. every other application shows digits however it can - be in indic font, or in indo-arabic.

no application just makes them invisible like eclipse does.


notepad3, windows explorer, double commander

SyntevoAlex commented 8 months ago

Just a couple weeks ago, I also noticed that StyledText fails to render some characters on Windows, in specific fonts. Probably some bug in how StyledText searches for fallback fonts?

Specifically, in this "text" : ༼꧅ ꧁ ꧂ ꦇ က only the first character is rendered when using Lucida Console font.

mickaelistria commented 8 months ago

@mihnita I disagree it's a hack, It s a system feature which SWT needs to obey (that's the expectation that SWT does proper native). I think that @goyalyashpal is right here and it's fine to expect SWT to render the digits just like the OS would do here. (I'm not saying it's trivial and high priority though, I just want to show support in the fact that this is a valid request).

@SyntevoAlex The StyledText does delegate drawing to TextLayout object, which are the ones that go to the OS. On Linux for instance, it directly calls Cairo to draw text; I'm not familiar with Windows but I suspect this TextLayout is where such magic feature of the OS might be short-circuited.

goyalyashpal commented 8 months ago

hi! thanks for ur input. just some clarifications:

it's fine to expect SWT to render the digits just like the OS would do here I'm not saying it's trivial and high priority though

key (glyph <- name):

mihnita commented 8 months ago

@mihnita I disagree it's a hack, It s a system feature which SWT needs to obey

I think my explanation was a bit unclear.

It is a Windows hack, not a SWT one. As such, it is a system feature, and should be obeyed.


The trouble is that this is very low level. Not controllable at regular GDI API level.

My suspicion here is that the problem is the font. It the current font in the widget does not contain glyphs for those digits, it will show as tofu. (and most monospaced fonts have relatively reduced i18n support)

Worse, it is not even possible to fallback to a different font when such characters are encountered, because there are no special characters in the text. They are normal ASCII digits (we can see that if we copy them in the clipboard then hex-inspect the content). So we can't inspect the text, detect Hindi digits, and use a font that supports Hindi.

I am not familiar with platform APIs used by SWT, only with the Windows i18n in general.

mickaelistria commented 8 months ago

Do I get it right some minimal fixes could be, in the Windows implementation of TextLayout:

  1. Force usage of indo-arabic (if Windows allows that), or
  2. if Windows is configured to use different digit settings (I guess it's something we can access via the system registry, can't we?), and -optionally- if text contains digits and if we know the current font cannot render them well (eg width == 0_ , then force usage of a more standard font that is known to have support for those. ?
mihnita commented 8 months ago

I tried to confirm if this is a font issue or not. And it looks like it is.

Selecting a font that contains digits for most Indic scripts (for example Segoe UI) the digits show fine. Since these fonts are not monospaced, this is of course not a solution.

ASCII-Consolas: Digits ASCII-Consolas

Kannada-Consolas: Digits Kannada-Consolas

Kannada-SegoeUI: Digits Kannada-SegoeUI

Gujarati-SegoeUI: Digits Gujarati-SegoeUI

Tamil-SegoeUI: Digits Tamil-SegoeUI

Thai-SegoeUI: Digits Thai-SegoeUI

mihnita commented 8 months ago

I'll take a look at TextLayout.

mihnita commented 8 months ago

Some preliminary analysis

TextLayout renders the text here:

if (gdipFont != 0 && !run.analysis.fNoGlyphIndex) {
    pRect = drawRunTextGDIP(gdipGraphics, run, rect, gdipFont, baselineInPixels, gdipFg, gdipSelForeground, selectionStart, selectionEnd, alpha);
} else {
    int fg = style != null && style.underline && style.underlineStyle == SWT.UNDERLINE_LINK ? linkColor : foreground;
    pRect = drawRunTextGDIPRaster(gdipGraphics, run, rect, baselineInPixels, fg, selForeground, selectionStart, selectionEnd);
}

drawRunTextGDIP ends up calling Gdip.Graphics_DrawDriverString here:

Gdip.Graphics_DrawDriverString(graphics, run.glyphs, run.glyphCount, gdipFont, selectionColor, points, 0, 0);

That is a GDI+ method: https://learn.microsoft.com/en-us/windows/win32/api/gdiplusgraphics/nf-gdiplusgraphics-graphics-drawdriverstring

And drawRunTextGDIPRaster ends up calling drawRunText, which calls OS.ScriptTextOut here:

OS.ScriptTextOut(hdc, run.psc, x, y, 0, null, run.analysis , 0, 0, run.glyphs, run.glyphCount, run.advances, run.justify, run.goffsets);

That is a Uniscribe method: https://learn.microsoft.com/en-us/windows/win32/api/usp10/nf-usp10-scripttextout


The interesting one is the first case, Gdip.Graphics_DrawDriverString That calls a GDI+ API, Graphics::DrawDriverString: https://learn.microsoft.com/en-us/windows/win32/api/gdiplusgraphics/nf-gdiplusgraphics-graphics-drawdriverstring

The documentation there says:

The Graphics::DrawDriverString method draws characters at the specified positions. The method gives the client complete control over the appearance of text. The method assumes that the client has already set up the format and layout to be applied.

And at the end:

This method does not support the handling of complex scripts and assumes that the client has set up all text layout in some other way. This method is useful for creating owner-drawn menu items. The client should use the DrawString Methods method for general purposes.

In other words, the method does no font fallback / linking / substitution magic. Assumes everything is already resolved, gives us all the power (and responsibility).


How do we end up in that branch, where we thing everything is resolved? From the initial condition:

if (gdipFont != 0 && !run.analysis.fNoGlyphIndex) { }

We have a font, so gdipFont != 0, makes sense.

And !run.analysis.fNoGlyphIndex tells us that the text analysis (which splits the text into runs), determined that the font contains the proper glyphs.

That is true. The text contains the ASCII digits 0-9 (U+0030 - U+0039). Most fonts (including Consolas, or whatever font we use in the editor) contain glyphs for the ASCII digits.

That's part of what makes this a (OS) hack: it is lying to us. And it is a bad hack, because it replaces the glyphs at a very low level, when the text is rendered. It means it has no context, has no way to know if those digits should really be national or ASCII.

The proper approach would be for that APIs that format numbers / dates / etc. to return the text with the proper digits. One should call those formatting APIs anyway, to get the proper decimal / thousand separators, grouping, etc.


TLDR:

// pseudocode
if we_have_a_font && the_font_has_glyphs_for_ascii_0_9:
        Use a GDI+ API that is that gives us all the control, the system does nothing for us.
        We have all the power, and all the responsibility.
        And we use the font we have. Which in fact has no Indic glyphs (or other international glyphs)
else:
        Use a "friendlier" Uniscribe API which takes away some of the burden (and some of the control)
        Does not even take a font as a parameter, so we don't control that anymore.
mihnita commented 8 months ago

Note that the screen used to configure these digits in the old one, from Control Panel (Control Panel -- Region -- Additional Settings -- "Numbers" tab)

But the Control Panel is slowly going away, being replaced by "Settings"

The equivalent digits configuration in Settings (Win 11) can be found in Settings -- Time and language -- Language & region -- Region / Region format -- Change formats -- Standard digits (last option)

And that option does not allow me to change the digits, all I have is ASCII. My assumption is that if my OS would be (for example) Thai then I would be able to choose between Thai and ASCII digits.


That setting was weird anyway... It makes sense that they remove it.

What would be a reason to use random digits (let's say Burmese) on a system in a completely different language. (Russian UI with Burmese digits? Why? What's the use case?)

And implemented as a display hack.

Even less useful in an IDE. I would not want see some digits that are in fact not there.

Imagine that you see assertEquals("०१२३४५६७८९", actual) failing. And you spend time debugging, and actual looks like "०१२३४५६७८९". You debug, you print it, looks good.

When in fact the actual is "0123456789", but the OS and the IDE are "lying" to you and render it with Devanagari digits. You can't tell until you inspect it as hex.

mihnita commented 8 months ago

TLDR: I think that a good fix for this would be to actually show the ASCII digits. That is what's really there.

Having the digits disappear (the way it is now) is indeed bad.

Trying to respect that OS setting would be bad behavior (the IDE "lying" to you and showing stuff that is not there) And it is a setting that is going away (MS is slowly moving things from Control Panel to Settings)

mihnita commented 8 months ago

Related material:

Over time Windows implemented all kinds of tricks with the fonts, trying to make make sure one never sees tofu. Overall successfully. Unless one uses low level APIs like Uniscribe things "just work" And that is why most applications respect that settings: they use standard widgets, or high level GDI APIs.

But now there is a mixture of font fallback, linking and substitution that at times steps on each-others toes. See https://learn.microsoft.com/bs-latn-ba/globalization/fonts-layout/fonts

mihnita commented 8 months ago

Sorry, I need some help...

I have some kind of fix, it works, but I don't think it is 100% proper. And had (have?) a hard time working with the native Windows files. I'll split it into several comments.

mihnita commented 8 months ago

Here is the diff between my repo (mihnita/eclipse.platform.swt), branch (mihai_fix_win_digitsubst) and the official repo: https://github.com/eclipse-platform/eclipse.platform.swt/compare/master...mihnita:eclipse.platform.swt:mihai_fix_win_digitsubst

I have problems building the binaries (the .dll files in binaries\org.eclipse.swt.win32.win32.x86_64) To make them work and to test I had to add some hacks in bundles\org.eclipse.swt\buildSWT.xml (documented in Readme.Win32.md, same folder). It works, but it is not the right way to do it. But I was unable to find any documentation, Readme.Win32.md seems really outdated.

In the end I managed to build the .dll files and test, but then I was unable to push them to GitHub (LFS problems :-( )


TLDR: the code fixes are in bundles/org.eclipse.swt/Eclipse SWT/win32/org/eclipse/swt/graphics/TextLayout.java, itemize() method (line 2741)

Before:

OS.ScriptApplyDigitSubstitution(0, scriptControl, scriptState);

After:

SCRIPT_DIGITSUBSTITUTE scriptDigitSubstitute = new SCRIPT_DIGITSUBSTITUTE();
...
scriptDigitSubstitute.NationalDigitLanguage = OS.LOCALE_USER_DEFAULT;
scriptDigitSubstitute.TraditionalDigitLanguage = OS.LOCALE_USER_DEFAULT;
scriptDigitSubstitute.DigitSubstitute = OS.SCRIPT_DIGITSUBSTITUTE_NONE;
OS.ScriptApplyDigitSubstitution(scriptDigitSubstitute, scriptControl, scriptState);

The rest are is the glue code adding SCRIPT_DIGITSUBSTITUTE.


Screenshots Before: DigitSubstPost Afer: DigitSubstPre

mihnita commented 8 months ago

Concerns, where I need help

I need help to generate the binaries (the .dll files) properly. I spent hours trying (and getting to a hacky way). And help pushing the .dll files to GitHub.


Do you think that is the right fix?

The current fix is "kind of" like the previous functionality, but with \ scriptDigitSubstitute.DigitSubstitute = OS.SCRIPT_DIGITSUBSTITUTE_NONE;

The other settings only reproduce the case we had before, when we passed null:

scriptDigitSubstitute.NationalDigitLanguage = OS.LOCALE_USER_DEFAULT;
scriptDigitSubstitute.TraditionalDigitLanguage = OS.LOCALE_USER_DEFAULT;
scriptDigitSubstitute.DigitSubstitute = OS.SCRIPT_DIGITSUBSTITUTE_NONE;

See ScriptApplyDigitSubstitution

[in] psds

Pointer to a SCRIPT_DIGITSUBSTITUTE structure. The application sets this parameter to NULL if the function is to call ScriptRecordDigitSubstitution with LOCALE_USER_DEFAULT.

Since now I pass something other than null, I set the 2 locale fields to LOCALE_USER_DEFAULT, to be as close as before as possible.


Where I have doubts

This will fix the reported issue, no problem.

But people use SWT to built their own applications, it is not only used by Eclipse. So maybe they want to have digit substitution in their own applications.

Option 1:

To accommodate for that I would have to add a setter for digit substitution, with the 4 values documented in SCRIPT_DIGITSUBSTITUTE: SCRIPT_DIGITSUBSTITUTE_CONTEXT, SCRIPT_DIGITSUBSTITUTE_NATIONAL, SCRIPT_DIGITSUBSTITUTE_NONE, SCRIPT_DIGITSUBSTITUTE_TRADITIONAL

Keep the default as SCRIPT_DIGITSUBSTITUTE_NATIONAL, and set it to SCRIPT_DIGITSUBSTITUTE_NONE in the Eclipse code editors.

The trouble is that TextLayout is Windows only, low level. So to make that setter useful we would have to propagate it and expose it in some higher level classes.

Also, that setting is Windows specific, it does nothing on Linux / Mac.

Option 1.a: we can expose the setter and do nothing on Mac / Linux

Option 1.b: we only add the digit substitution setter to Windows => not really portable.

Option 2:

Keep the patch "as is", and disable digit substitutions for all apps.

That would be a change in behavior. But as I explained above, that kind of substitution seems to me like a (Windows) hack. And one that seems to go away. (will probably work for a long time, as Windows is big in backward compatibility, but it will not be accessible in Settings)


A bit more: it looks like the digits continue to be Indic (substitution in effect) in combo-boxes, lists, other places (see the font size in the screenshots above).

I suspect that they don't use TextLayout

Thank you! M

SyntevoAlex commented 8 months ago

I need help to generate the binaries (the .dll files) properly. I spent hours trying (and getting to a hacky way).

What kind of errors did you have? What did you change to get them built?

And help pushing the .dll files to GitHub.

You don't. They are built by build bot after merging source code.

mihnita commented 8 months ago

Thank you for the help!

What kind of errors did you have? What did you change to get them built?

Documented the errors and what I did in my update to bundles/org.eclipse.swt/Readme.Win32.md

I have no intention to submit as is, of course.

The main changes I had to do were all in bundles/org.eclipse.swt/buildSWT.xml

Marked them with the comment "TODO: fix, temporary hack"

Summary (all in buildSWT.xml):

  1. Define swt.ws and swt.os, because they were undefined when running from Eclipse
  2. Added the files needed for the GraalVM JavaScript engine, as JS support is not available in JDK 17 anymore. There might be a way to add that to ant itself, but it is not documented
  3. Had to add x86_64 and all as parameter when invoking build.bat

It might all be because I couldn't find the right documentation.

You don't. They are built by build bot after merging source code.

Great! Thank you! It probably means I don't need to touch buildSWT.xml, assuming it works "as is" (somewhere, in some cloud build system :-)

But I still needed to be able to generate them locally, for testing. And the "Building and Testing locally" section in Readme.Win32.md seems outdated.