christian-vigh-phpclasses / PdfToText

Extracts text from PDF files
Other
125 stars 93 forks source link

Causes garbled characters #43

Open enya1010 opened 3 years ago

enya1010 commented 3 years ago

I'm Japanese, so I'm not good at English. Excuse me. PDF containing Japanese is garbled. Isn't it UTF-8? ?? I would appreciate it if you could tell me how to solve it. Thank you for your cooperation.

peon501 commented 3 years ago

no idea how to solve it. No, it is not UTF-8. PDF format predates UTF-8 format. I don't know a lot about that, but you can read this: https://www.prepressure.com/pdf/basics/fonts Or just google.

peon501 commented 3 years ago

also, look into this. https://github.com/christian-vigh-phpclasses/PdfToText/issues/14