MicheleCotrufo / pdf2doi

A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.
101 stars 18 forks source link

Program returns error on encrypted files, would prefer if it skipped them. #6

Closed Pathos315 closed 3 years ago

Pathos315 commented 3 years ago

A file in the list wasn't decrypted, and so it returned this error. Ideally, it should log a warning that it's encrypted, and then skip over it.

`[pdf2doi]: Trying to retrieve a DOI/identifier for the file: ...

[pdf2doi]: Method #1: Looking for a valid identifier in the document infos...

Traceback (most recent call last):

File "/usr/local/bin/pdf2doi", line 8, in sys.exit(main())

File "/usr/local/lib/python3.9/site-packages/pdf2doi/main.py", line 410, in main results = pdf2doi(target=target,

File "/usr/local/lib/python3.9/site-packages/pdf2doi/main.py", line 112, in pdf2doi result = pdf2doi( target=file, verbose=verbose, websearch=websearch, webvalidation=webvalidation,

File "/usr/local/lib/python3.9/site-packages/pdf2doi/main.py", line 147, in pdf2doi result = pdf2doi_singlefile(filename)

File "/usr/local/lib/python3.9/site-packages/pdf2doi/main.py", line 190, in pdf2doi_singlefile result = finders.find_identifier(filename,method="document_infos",keysToCheckFirst=['/doi','/identfier'])

File "/usr/local/lib/python3.9/site-packages/pdf2doi/finders.py", line 487, in find_identifier identifier, desc, info = finder_methodsmethod

File "/usr/local/lib/python3.9/site-packages/pdf2doi/finders.py", line 587, in find_identifier_in_pdf_info pdfinfo = get_pdf_info(path)

File "/usr/local/lib/python3.9/site-packages/pdf2doi/finders.py", line 275, in get_pdf_info info = pdf.getDocumentInfo()

File "/usr/local/lib/python3.9/site-packages/PyPDF2/pdf.py", line 1101, in getDocumentInfo obj = self.trailer['/Info']

File "/usr/local/lib/python3.9/site-packages/PyPDF2/generic.py", line 516, in getitem return dict.getitem(self, key).getObject()

File "/usr/local/lib/python3.9/site-packages/PyPDF2/generic.py", line 178, in getObject return self.pdf.getObject(self).getObject()

File "/usr/local/lib/python3.9/site-packages/PyPDF2/pdf.py", line 1617, in getObject raise utils.PdfReadError("file has not been decrypted")

PyPDF2.utils.PdfReadError: file has not been decrypted`

MicheleCotrufo commented 3 years ago

Thanks for pointing this out, it will be fixed in the new version.

MicheleCotrufo commented 3 years ago

Can you try installing the new release-candidate version ( pip install pdf2doi==0.7rc2) and check that the bug was succesfully fixed?

Pathos315 commented 3 years ago

Apologies for delay, will check now

Pathos315 commented 3 years ago

It works! :)