Umlauts in filename problem and PyPDF2 hiccups

cycomanic / Menextract2pdf

Extract Mendely annotations to PDF FIles

GNU General Public License v3.0

35 stars 15 forks source link

After I decrypted my database I used menextract2pdf to get my annotations into the pdfs. I encountered a couple of errors:

Could not find pdffile /Users/armin/Desktop/ProjekteOnHold/ceat/mendeley_archive/Mach - 1886 - BeitrÃ¤ge zur Analyse der Empfindungen.pdf

This is an Umlaut encoding issue. Adding .decode("utf8") on line 28 solved this problem for me.

zlib.error: Error -3 while decompressing data: incorrect header check and ValueError: invalid literal for int() with base 10: 'dobj'

These were errors related to specific (kind of corrupted) pdfs. I added print(fn) to processpdf(fn, fn_out, annotations) so I could identify and manually remove the culprits.

Thank you for writing Menextract2pdf!

cycomanic / Menextract2pdf

Umlauts in filename problem and PyPDF2 hiccups #18