eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.59k stars 120 forks source link

opening password protected PDFs does not work #2636

Closed mglatz closed 2 months ago

mglatz commented 4 months ago

First of all, thank you for this great peace of SW!

Running docspell latest in docker I can't process (unlock) any password protected PDFs. I do have at least two different PDFs from different organizations, both using different passwords. I am able to open and unlock both of them using web browser as password viewer, but docspell fails to unlock them. Both passwords are stored in config file under decrypt-pdf and decrypt-pdf is enabled. The log shows

Sun, May 12th, 2024, 12:09: Trying to read the PDF using 2 passwords
Sun, May 12th, 2024, 12:09: Try opening PDF with password: '8***
Sun, May 12th, 2024, 12:09: Try opening PDF with password: '1***
Sun, May 12th, 2024, 12:09: None of the passwords helped to read the given PDF!
...
Sun, May 12th, 2024, 12:09: ocrmypdf stderr: EncryptedPdfError: Input PDF is encrypted. The encryption must be removed to perform OCR. For information about this PDF's security use qpdf --show-encryption infilename You can remove the encryption using qpdf --decrypt [--password=[password]] infilename
Sun, May 12th, 2024, 12:09: PDF conversion failed: Command result=8. No output file found.. Go without PDF file

I am sure that one of the passwords is valid for the file being processed I also tried adding the passwords under collective settings in UI but that does not help either. A note: The files were uploaded before passwords were entered, then I added passwords, restarted docspell stack and tried to reprocess the files. The log above is output from the reprocessing.

eikek commented 4 months ago

It is trying 2 passwords, does it look like the one you want it to use is among these? Can you share such a pdf for me to play with? It is possible the the pdf library is not able to decrypt it due to other reasons.

mglatz commented 4 months ago

Yep, the passwords look ok. I've been able to unlock all the files with https://tools.pdf24.org/en/unlock-pdf. Ill share the file with you, but give me an email please, dont want to post it here :) thanks.

eikek commented 4 months ago

You can send me a file to info at docspell.org or use the (private) matrix chat @eikek:matrix.org

eikek commented 4 months ago

Hi @mglatz thanks for sending the file. I couldn't reproduce the issue on my dev setup, though :/. What version of docspell are you using? Where did you put the passwords - in the config file or in the collective setting?

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!