BobLd / tabula-sharp

Extract tables from PDF files (port of tabula-java)
MIT License
161 stars 25 forks source link

garbled text #24

Open haya4 opened 2 years ago

haya4 commented 2 years ago

When Japanese-language forms are output, the characters may be garbled in Stream mode.

This may be caused by PdfPig. When I extracted only text using PdfPig, the text was garbled as well.

Are you planning to use another library (such as DocNET) instead of PdfPig? In the case of DocNET, the text was not garbled.

BobLd commented 2 years ago

Hi @haya4, it must be related to PdfPig indeed.

I've no plan to switch to another library but PdfPig is actively developped, so would you mind creating an issue in the PdfPig repo? Also, if you could share a pdf example over there, it would be very useful

haya4 commented 2 years ago

Thanks for your comment @BobLd .

You're right, it's better to set up an Issue with PdfPig. I can't publish the PDF that is causing garbled text because it is highly confidential. Sorry about that.