Closed jone closed 10 years ago
@jone Error: The supplied password does not match either the owner or user password in the document
- Seems pretty obvious to me, those are encrypted PDFs, no other implementation would be able to extract text from any of those PDFs.
However, since these seems to be a common occurence, ftw.tika
could watch out for that specific Java exception, and then don't spam the entire traceback into the logs but just log a warning/info along the lines of "Couldn't extract text from encrypted PDF, skipping..."
.
:+1:
:+1:
While reindexing a bunch of files I had this exception multiple times:
It seems that tika is not that good with PDFs. There is also a Plone standard transform
pdf_to_text
. Is there a reason why we override the standard transform with the tika transform?