camelot-dev / excalibur

A web interface to extract tabular data from PDFs
https://excalibur-py.readthedocs.io
MIT License
1.55k stars 227 forks source link

ERROR:root:file has not been decrypted #22

Open dunnap opened 5 years ago

dunnap commented 5 years ago

While trying to upload a pdf file following error is through.

ERROR:root:file has not been decrypted Traceback (most recent call last): File "c:\py3\lib\site-packages\excalibur\tasks.py", line 57, in split save_page(file.filepath, file.page_number) File "c:\py3\lib\site-packages\excalibur\utils\task.py", line 10, in save_page page = infile.getPage(page_number - 1) File "c:\py3\lib\site-packages\PyPDF2\pdf.py", line 1176, in getPage self._flatten() File "c:\py3\lib\site-packages\PyPDF2\pdf.py", line 1505, in _flatten catalog = self.trailer["/Root"].getObject() File "c:\py3\lib\site-packages\PyPDF2\generic.py", line 516, in getitem return dict.getitem(self, key).getObject() File "c:\py3\lib\site-packages\PyPDF2\generic.py", line 178, in getObject return self.pdf.getObject(self).getObject() File "c:\py3\lib\site-packages\PyPDF2\pdf.py", line 1617, in getObject raise utils.PdfReadError("file has not been decrypted") PyPDF2.utils.PdfReadError: file has not been decrypted

vinayak-mehta commented 5 years ago

You'll need to decrypt your file using qpdf. An option to specify password was added to the underlying Python library after this https://github.com/socialcopsdev/camelot/issues/162#issuecomment-432040067. So I think it probably makes sense to the same option while uploading the PDF to Excalibur.

dunnap commented 5 years ago

But, PDF i was trying to parse doesn't have any password protection.. Attached file for reference.. Unclaimed Unpaid Dividend-2013-2014-Interim.pdf