Tehreer / SheenBidi

A sophisticated implementation of Unicode Bidirectional Algorithm
Apache License 2.0
127 stars 21 forks source link

CJK / Hangul has return SBBidiTypeON which reversed when mixing with arabic #8

Closed Benau closed 4 years ago

Benau commented 4 years ago

For example text 阿拉伯語العربية when given to sheenbidi the SBLineGetRunCount is only 1 which 阿拉伯語 will be reversed like the arabic because it returns a neutral bidi type for chinese

From UnicodeData.txt it seems the cjk and hangul has 4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;; "L" for LTR, maybe it return something wrongly elsewhere?

I use this as a workaround:

        if ((codepoint >= 0x4E00 && codepoint <= 0x9FFF) ||
            (codepoint >= 0x3400 && codepoint <= 0x4DBF) ||
            (codepoint >= 0xF900 && codepoint <= 0xFAFF) || // CJK
            (codepoint >= 0xAC00 && codepoint <= 0xD7A3)) // Hangul
            types[firstIndex] = SBBidiTypeL;
        else
            types[firstIndex] = LookupBidiType(codepoint);
mta452 commented 4 years ago

The workaround is correct. But it turns out that UnicodeData.txt can define ranges. However, its parser is not so sophisticated. Therefore, I have used DerivedBidiClass.txt to generate bidi type lookup. Can you check against the latest commit in dev branch and let me know if the issue is fixed?

Benau commented 4 years ago

Yes it fixed!

btw (even exists in current master) i see some valgrind warning:

==3670== Conditional jump or move depends on uninitialised value(s)
==3670==    at 0xC6CB0D: BidiChainIsSingle (BidiChain.c:55)
==3670==    by 0xC6CB0D: ResolveWeakTypes (IsolatingRun.c:148)
==3670==    by 0xC6CB0D: IsolatingRunResolve (IsolatingRun.c:495)
==3670==    by 0xC6CB0D: ProcessRun (SBParagraph.c:510)
==3670==    by 0xC6CB0D: DetermineLevels (SBParagraph.c:479)
==3670==    by 0xC6CB0D: SBParagraphCreate (SBParagraph.c:577)
==3670==    by 0xC6CB0D: SBAlgorithmCreateParagraph (SBAlgorithm.c:157)

workaround is memset this:

diff --git a/lib/sheenbidi/Source/SBParagraph.c b/lib/sheenbidi/Source/SBParagraph.c
index 36d562cf8..f37d0aed1 100644
--- a/lib/sheenbidi/Source/SBParagraph.c
+++ b/lib/sheenbidi/Source/SBParagraph.c
@@ -17,6 +17,7 @@
 #include <SBConfig.h>
 #include <stddef.h>
 #include <stdlib.h>
+#include <string.h>

 #include "BidiChain.h"
 #include "BidiTypeLookup.h"
@@ -54,6 +55,7 @@ static ParagraphContextRef CreateParagraphContext(const SBBidiType *types, SBLev
     const SBUInteger offsetTypes   = offsetLinks + sizeLinks;

     SBUInt8 *memory = (SBUInt8 *)malloc(sizeMemory);
+    memset(memory, 0, sizeMemory);
     ParagraphContextRef context = (ParagraphContextRef)(memory + offsetContext);
     BidiLink *fixedLinks = (BidiLink *)(memory + offsetLinks);
     SBBidiType *fixedTypes = (SBBidiType *)(memory + offsetTypes);

Your idea?

Btw another (feature request or already possible?), i will use your library for text rendering in supertuxkart: Screenshot from 2020-08-04 11-29-32 But for chat message seems that :english in the rightmost (like in firefox) is easier to read, is it possible to implement sheenbidi for this purpose?

(btw your library is smarter than fribidi, at least the country flag is not reversed like in fribidi!)

mta452 commented 4 years ago

Yes, coveralls is broken, that's why I have not merged it in the master branch yet.

Please ignore the warning. The data is initialized afterwards, but it's up to you to keep the memset.

I think the alignment issue can be fixed by passing 1 in the baseLevel parameter of SBAlgorithmCreateParagraph()

mta452 commented 4 years ago

Closing the issue as it's fixed in v2.3

Benau commented 4 years ago

Thank you and i can use it in next stk release!