christian-vigh-phpclasses / PdfToText

Extracts text from PDF files
Other
125 stars 93 forks source link

Issue in Parsing PDF 1.4 with Barcode #5

Open nkbaba opened 8 years ago

nkbaba commented 8 years ago

I tried parsing 1.4 PDF with the barcode but output had weird characters.

²cš>K‚waH{üT:ÏóåÉqœ¹Ì׏öß ’g.óõ£ý7ˆä™Ë|ýhÿ
,Ñ°U5Š”dEJ»
Ê[%"¶”X²"@8ÚÑíD×5Uµ¼‡œîJ8>kÂQ£¸õ€èå"Ùˆ`rä†êKÉ:†ïç¢$üÍ¥ÔD‰„E¢?Äý£Hw­=†/    J1 nÕÂáDwFYO¬k 
christian-vigh-phpclasses commented 8 years ago

Wow ! I never imagined parsing barcode characters ! could you please send me your sample pdf file at the following address :

        christian.vigh@wuthering-bytes.com

and also tell me if you are using any tools that are able to parse such information ?

Thank you.


De : Apurva [mailto:notifications@github.com] Envoyé : mercredi 6 juillet 2016 22:40 À : christian-vigh-phpclasses/PdfToText Objet : [christian-vigh-phpclasses/PdfToText] Issue in Parsing PDF 1.4 with Barcode (#5)

I tried parsing 1.4 PDF with the barcode but output had weird characters.

²c��>��K�waH{üT:ÏóåÉq���¹Ì×�öß �g.óõ£ý7�ä�Ë|ýhÿ ,�Ñ°U5��dEJ»� Ê[%"¶�X²"@8Ú�Ñí�D×5��Uµ¼��îJ8>k�ÂQ£¸�õ�èå"Ù��`rä���êKÉ:�ïç¢$�üÍ¥Ô�D���E�¢?�Äý£Hw­=�/ J1 nÕÂá�DwFYO¬k� ��

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it https://github.com/christian-vigh-phpclasses/PdfToText/issues/5 on GitHub, or mute https://github.com/notifications/unsubscribe/ARM8an7K94247oTy-H-zJgotOlsakeFhks5qTBKhgaJpZM4JGepI the thread. https://github.com/notifications/beacon/ARM8ajM8QxzdqcpfRmKnzsypkpdzhUovks5qTBKhgaJpZM4JGepI.gif


L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus

nkbaba commented 8 years ago

Hi Christian,

I won't be able to send you the sample file as I am an intern and not allowed to share confidential data. I will ask for the permission and let you know.

I also want to include that I got around this issue with https://github.com/smalot/pdfparser

I had to convert my PDF 1.7 down to 1.4 from https://packagist.org/packages/xthiago/pdf-version-converter

Finally, I gave up and created my own pdftotext utility using shell.

christian-vigh-phpclasses commented 8 years ago

Hi Apurva,

First, thanks for your feedback.

I perfectly understand that you cannot share confidential data. Maybe you have the possibility to generate such a pdf document with anonymous (dummy) data inside ?

Also, I suppose that you tried the last version (1.2.47) ?

Is your shell utility working ?

With kind regards,

Christian.


De : Apurva [mailto:notifications@github.com] Envoyé : mardi 27 septembre 2016 20:11 À : christian-vigh-phpclasses/PdfToText Cc : christian-vigh-phpclasses; Comment Objet : Re: [christian-vigh-phpclasses/PdfToText] Issue in Parsing PDF 1.4 with Barcode (#5)

Hi Christian,

I won't be able to send you the sample file as I am an intern and not allowed to share confidential data. I will ask for the permission and let you know.

I also want to include that I got around this issue with https://github.com/smalot/pdfparser

I had to convert my PDF 1.7 down to 1.4 from https://packagist.org/packages/xthiago/pdf-version-converter

Finally, I gave up and created my own pdftotext utility using shell.

— You are receiving this because you commented. Reply to this email directly, view https://github.com/christian-vigh-phpclasses/PdfToText/issues/5#issuecommen t-249949181 it on GitHub, or mute https://github.com/notifications/unsubscribe-auth/ARM8amCAkZ9c5wTqcqFPS9pkC 7dlE44kks5quVwigaJpZM4JGepI the thread. https://github.com/notifications/beacon/ARM8aszcBXB4DpPNNTkONUCl97debnWHks5 quVwigaJpZM4JGepI.gif


L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus