Closed UserIsAmitHasan closed 2 years ago
Thank you for investigating this case! I am not an expert in this domain. I have tried to analyse your example as much as I could. I don't see any problems there. Would you like to double check and elaborate more about the current output and what is expected?
The text shaping capability is performed mostly by the HarfBuzz library so I have high confidence that the result should be correct. 😁
The text needs to itemized before passing to HarfBuzz into runs with the same direction, script, and language. It seems that the whole text is passed to HarfBuzz as one item and the first script in the text is used (perhaps by using hb_buffer_guess_segment_properties()
).
As far as I know, HarfBuzz is capable of detecting the text direction automatically. E.g. Arabic is going to be classified as right-to-left but numbers in-between will be left-to-right. Please share more information, why you think that the current output is not right 😁
How are using HarfBuzz, are you calling the C API directly or through some wrapper? HarfBuzz does not handle BiDi the way you describe.
Before we rush into implementation details and how the API works, I would prefer to see an example showing: 1) what is currently returned by QuestPDF and 2) what is expected 😁
Would you like please to share more examples so I can see the actual difference? (code + screenshots) I don't see anything wrong in the screenshot from the original message. But again, I am not an expert here so I may be missing something. Best if there is also any screenshot from a well-known editor, e.g. Microsoft Word.
Youczn compare the individual words after the Latin text in the screenshot, it is the same text and the words are coming up differently.
@MarcinZiabek To reproduce please take a look this Repository:QuestPdf.Issue.260.
It seems putting ASCII text first makes HarfBuzz to interpret the whole text as ASCII. Putting English and Bengali text in two different span solves the problem but it is not an ideal solution.
Looking at the original screenshot you posted, am I right in thinking these two would be examples of incorrect formatting?
@girlpunk Yes. Exactly. Bengali text in the top rectangle is mostly broken but bottom rectangle is perfect.
This issues is fixed in latest release. Thank you @MarcinZiabek .
Describe the bug Unicode text shaping capability breaks when paragraph starts with English but does not when English text is in the middle of paragraph.
To Reproduce .Text("English text at the start of a paragraph breaks the Unicode text shaping capability. বিপদ আরও বাড়াচ্ছে টানা বৃষ্টি। গতকাল শনিবারও অবিরাম বৃষ্টি আর পাহাড়ি ঢল অব্যাহত ছিল। প্রায় সব ধরনের যোগাযোগ বিচ্ছিন্ন হয়ে পড়া সিলেট ও সুনামগঞ্জের বন্যা পরিস্থিতি সামাল দিতে হিমশিম খাচ্ছে স্থানীয় প্রশাসন।");
.Text("বিপদ আরও বাড়াচ্ছে টানা বৃষ্টি। English text in the middle does not break the Unicode text shaping capability. গতকাল শনিবারও অবিরাম বৃষ্টি আর পাহাড়ি ঢল অব্যাহত ছিল। প্রায় সব ধরনের যোগাযোগ বিচ্ছিন্ন হয়ে পড়া সিলেট ও সুনামগঞ্জের বন্যা পরিস্থিতি সামাল দিতে হিমশিম খাচ্ছে স্থানীয় প্রশাসন।");
Expected behavior Screenshot shows correct behaviour.
Screenshots
Additional context Language I am testing with is Bengali. Font Used: Kalpurush