christian-vigh-phpclasses / PdfToText

Extracts text from PDF files
Other
125 stars 93 forks source link

Problem with Euro (€) char #34

Closed kobi-wan closed 4 years ago

kobi-wan commented 4 years ago

Hello,

I'm trying to convert a table from an exported excel sheet (O365). I one column is a value with a "€" (Euro) char. But this column is not extracted at all.

Input is:

707677 Apfel-Butterstreusel-Blechkuchen 2.900 g TK STK 13,78 € 707792 Apfel-Butterstreusel-Blechkuchen 2.900 g TK STK 13,78 €

Output is:

707677 Apfel- Butterstreusel- Blechkuchen 2.900 g TK STK 707792 Apfel- Butterstreusel- Blechkuchen 2.900 g TK STK ...

I also recreated a similar table but with the euro char replaced by a "$" (dollar). Then it works. It also works with german umlauts.

greetings kobi test2.pdf

kobi-wan commented 4 years ago

I figured it out: It is the same thing with experimental CID Font implementation as described here: https://github.com/christian-vigh-phpclasses/PdfToText/issues/20#issuecomment-338310062