cycomanic / Menextract2pdf

Extract Mendely annotations to PDF FIles
GNU General Public License v3.0
35 stars 15 forks source link

Umlauts in filename problem and PyPDF2 hiccups #18

Open arminbw opened 5 years ago

arminbw commented 5 years ago

After I decrypted my database I used menextract2pdf to get my annotations into the pdfs. I encountered a couple of errors:

Could not find pdffile /Users/armin/Desktop/ProjekteOnHold/ceat/mendeley_archive/Mach - 1886 - Beiträge zur Analyse der Empfindungen.pdf

This is an Umlaut encoding issue. Adding .decode("utf8") on line 28 solved this problem for me.

zlib.error: Error -3 while decompressing data: incorrect header check and ValueError: invalid literal for int() with base 10: 'dobj'

These were errors related to specific (kind of corrupted) pdfs. I added print(fn) to processpdf(fn, fn_out, annotations) so I could identify and manually remove the culprits.

Thank you for writing Menextract2pdf!

folofjc commented 4 years ago

I had to add print(fn.encode("utf-8")) since it was even failing on the print command.

Then I had to go change the title of the article in Mendeley and close Mendeley. Then delete the file from the Downloaded folder. Then start Mendeley again and sync. It would then download the file and rename it without the offending characters.