Open umrashrf opened 10 years ago
Could you upload or send me the PDF in question?
Sure I can, where should I send? I would resist uploading here though.
Hi, thanks for the pdf. I looked into it, and found that the missing texts are actually not a part of the page content, but implemented as a part of Acrobat form. It wasn't a problem of malformed PDF. Right now, pdfminer doesn't support extraction from a form. It shouldn't be that hard though, so in future I will try to add those features.
No problem. Look forward to that feature then. Until then I am okay to use poppler.
I have a PDF file which is extracting whole text from PDF file with pdftotext by poppler but pdf2txt by PDFMiner fails to extract whole text.
Although pdftotext by poppler gives an error but extract whole text.
Error: PDF file is damaged - attempting to reconstruct xref table...
Looks like xpdf got some reconstruction ability and PDFMiner didn't.