ahausladen / PdfiumLib

PDF VCL Control using PDFium
Mozilla Public License 2.0
170 stars 63 forks source link

PdfiumLib cannot retrieve text from non-ANSI PDF files #34

Closed Wakaran-Dev closed 1 year ago

Wakaran-Dev commented 1 year ago

PdfiumLib returns gibberish for non-ANSI PDF files when use text methods such as GetTextInRect, CopyToClipboard, etc.

To Reproduce:

  1. Add button to the PDFiumLib example application with code:

    Memo1.Text := FCtrl.GetTextInRect(Rect(0, 0, 1000, 1000))

  2. Run and open the attached file:

[PDF_Test.zip]https://github.com/ahausladen/PdfiumLib/files/12773903/PDF_Test.zip

  1. Navigate to 5th page ("Lesson 1")

  2. Click Button

RESULT IS:

 www.nhk.or.jp/lesson/english

*UDPPDU7LSV LESSON 1 எ ॎञख म॔থॼदघ WATASHI WA ANNA DESU ⋇  1RXQ$:$1RXQ%'(68  $LV% etc.

https://github.com/ahausladen/PdfiumLib/assets/146505455/3dc20fe1-64b7-4e28-8eb7-fe4b2f410867

PDF_Test.zip PDFiumLib_Text

Wakaran-Dev commented 1 year ago

Also, environment is: Delphi 11 Update 3 32bit app PDFiumLib as at 2023-09-28 chromium/5744

ahausladen commented 1 year ago

The same problem happens if you copy&paste the text from Google Chrome, Firefox or Edge. So this is nothing specific for PdfiumLib. It must come from PDFium itself. But also using AcrobatReader for copy&paste contains gibberish instead of the Japanese characters.