euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.
https://github.com/pdfminer/pdfminer.six
MIT License
5.25k stars 1.13k forks source link

Wrong Conversion pdf2text for PDF generated by Google Docs #42

Closed hugo53 closed 10 years ago

hugo53 commented 10 years ago

Please check these following files to see the bug. PDF: https://www.dropbox.com/s/arpkkzvi9e7evfc/Untitleddocument2.pdf text: https://www.dropbox.com/s/g4jq9t7taahdgce/googledocs2.txt I used command: pdf2txt.py Untitleddocument2.pdf > googledocs2.txt to convert pdf document (generated by Google Docs service) and the output is the text file which shows bad content.

euske commented 10 years ago

Sorry for responding too late. Could you upload the pdf file again? It seems it's erased on Dropbox. Thanks.

hugo53 commented 10 years ago

@euske Sorry, maybe I renamed its folder. Please check these links: https://www.dropbox.com/s/yvl5kcvkw4ypoi7/Untitleddocument2.pdf https://www.dropbox.com/s/mum9arsj1jq509i/googledocs2.txt

euske commented 10 years ago

Should be fixed in 340387bfc692f134579d844b164febc7028be501. Thanks for helping!

hugo53 commented 10 years ago

@euske Great!