[Bug] Unicode text shaping capability breaks when paragraph starts with English text

UserIsAmitHasan commented 2 years ago

Describe the bug Unicode text shaping capability breaks when paragraph starts with English but does not when English text is in the middle of paragraph.

To Reproduce .Text("English text at the start of a paragraph breaks the Unicode text shaping capability. বিপদ আরও বাড়াচ্ছে টানা বৃষ্টি। গতকাল শনিবারও অবিরাম বৃষ্টি আর পাহাড়ি ঢল অব্যাহত ছিল। প্রায় সব ধরনের যোগাযোগ বিচ্ছিন্ন হয়ে পড়া সিলেট ও সুনামগঞ্জের বন্যা পরিস্থিতি সামাল দিতে হিমশিম খাচ্ছে স্থানীয় প্রশাসন।");

.Text("বিপদ আরও বাড়াচ্ছে টানা বৃষ্টি। English text in the middle does not break the Unicode text shaping capability. গতকাল শনিবারও অবিরাম বৃষ্টি আর পাহাড়ি ঢল অব্যাহত ছিল। প্রায় সব ধরনের যোগাযোগ বিচ্ছিন্ন হয়ে পড়া সিলেট ও সুনামগঞ্জের বন্যা পরিস্থিতি সামাল দিতে হিমশিম খাচ্ছে স্থানীয় প্রশাসন।");

Expected behavior Screenshot shows correct behaviour.

Screenshots

Additional context Language I am testing with is Bengali. Font Used: Kalpurush

MarcinZiabek commented 2 years ago

Thank you for investigating this case! I am not an expert in this domain. I have tried to analyse your example as much as I could. I don't see any problems there. Would you like to double check and elaborate more about the current output and what is expected?

The text shaping capability is performed mostly by the HarfBuzz library so I have high confidence that the result should be correct. 😁

khaledhosny commented 2 years ago

The text needs to itemized before passing to HarfBuzz into runs with the same direction, script, and language. It seems that the whole text is passed to HarfBuzz as one item and the first script in the text is used (perhaps by using hb_buffer_guess_segment_properties()).

MarcinZiabek commented 2 years ago

As far as I know, HarfBuzz is capable of detecting the text direction automatically. E.g. Arabic is going to be classified as right-to-left but numbers in-between will be left-to-right. Please share more information, why you think that the current output is not right 😁

khaledhosny commented 2 years ago

How are using HarfBuzz, are you calling the C API directly or through some wrapper? HarfBuzz does not handle BiDi the way you describe.

MarcinZiabek commented 2 years ago

Before we rush into implementation details and how the API works, I would prefer to see an example showing: 1) what is currently returned by QuestPDF and 2) what is expected 😁

Would you like please to share more examples so I can see the actual difference? (code + screenshots) I don't see anything wrong in the screenshot from the original message. But again, I am not an expert here so I may be missing something. Best if there is also any screenshot from a well-known editor, e.g. Microsoft Word.

khaledhosny commented 2 years ago

Youczn compare the individual words after the Latin text in the screenshot, it is the same text and the words are coming up differently.

UserIsAmitHasan commented 2 years ago

@MarcinZiabek To reproduce please take a look this Repository:QuestPdf.Issue.260.

Output PDF

It seems putting ASCII text first makes HarfBuzz to interpret the whole text as ASCII. Putting English and Bengali text in two different span solves the problem but it is not an ideal solution.

girlpunk commented 2 years ago

Looking at the original screenshot you posted, am I right in thinking these two would be examples of incorrect formatting?

UserIsAmitHasan commented 2 years ago

@girlpunk Yes. Exactly. Bengali text in the top rectangle is mostly broken but bottom rectangle is perfect.

UserIsAmitHasan commented 2 years ago

This issues is fixed in latest release. Thank you @MarcinZiabek .

QuestPDF / QuestPDF

[Bug] Unicode text shaping capability breaks when paragraph starts with English text #260