Open SharmileeS opened 10 years ago
Do you get this error on every pdf? Can I have the pdf that causes this problem?
How do i attach pdf here?
I don't think you can. Upload somewhere else and post a link to it.
Does not happen on every pdf just on some. Here's a link to one of the pdf's showing this problem. http://webapp.psc.state.md.us/Intranet/Casenum/NewIndex3_VOpenFile.cfm?filepath=C:\Casenum\9200-9299\9208\Item_171\\Ex.D-smartmeterinstallationsfires.pdf
I am getting this problem too. Has anyone figured out how to fix it?
The same issue.
Hi, did you have a chance to look into this? Do you need more pdfs to reproduce the issues or any other help with testing?
I am also experiencing this problem.
Sorry for the late reply. Commit b589da51b7bd0ea97597fc8f40cf1e68115e5b94 have fixed this, so the latest revision shouldn't have this problem.
Thanks for the hint - running from latest git version now. The files don't throw errors anymore now, but produce one char per line for the complete file when running pdf2text -M 500 -L 13. Is there any workaround for this or is it not possible to get proper output on those files(Your commit comment said 'malformed PDFs')
It's because the characters is a part of an embedded object, which pdf2txt avoid performing the layout analysis. To force it to every object, try adding -A option.
I get this error when I use pdfplumber. python 3.6.5/pdfminer.six==20170720/pdfplumber==0.5.10
File "/Users/xuyangchun/.pyenv/versions/evaluation365/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 661, in _getobj_parse objid1 = x[-2] IndexError: list index out of range
I get this error both from cmd tool pdf2txt and from code:
File "C:\Python27\lib\site-packages\pdfminer\pdfpage.py", line 123, in get_pag es doc = PDFDocument(parser, caching=caching) File "C:\Python27\lib\site-packages\pdfminer\pdfdocument.py", line 309, in i nit xref.load(parser) File "C:\Python27\lib\site-packages\pdfminer\pdfdocument.py", line 194, in loa d objid1 = objs[index*2] IndexError: list index out of range