R0Wi-DEV / workflow_ocr

This is a Nextcloud Workflow App which enables you to process files via OCR on serverside.
GNU Affero General Public License v3.0
79 stars 6 forks source link

Can not ocr pdf files as input #167

Closed FadeFx closed 1 year ago

FadeFx commented 1 year ago

Hi, i was playing a while ad could not find out what is my issue, workflow_ocr did not seem to work. My only trigger was a collaborative tag (ocrme). However, when I tried to use a file created and mimetype pdf i found out that pdf was not allowed, so i created a jpg from one pdf and added my tag. That moment my server began to ocr the file and saved it as a pdf. My question is, why is it not possible to handle PFD files? Am I holding it wrong? ;-)

R0Wi commented 1 year ago

However, when I tried to use a file created and mimetype pdf i found out that pdf was not allowed, so

What do you mean by "PDF was not allowed"? Aren't you able to setup a workflow like described here: https://github.com/R0Wi/workflow_ocr#trigger-ocr-if-file-was-created-or-updated ?

FadeFx commented 1 year ago

Yes, i can set it up, but if I tag a pdf file it will not be processed, however jpg files will be. If I try to add a filter for specific mime type pdf i can not save the workflow the save button turns orange saying the configuration is invalid and an additional error message that the regular expression is invalid Screenshot_20221204-202117_1

Sorry screenshot is German and from phone...

R0Wi commented 1 year ago

Could be you're hiting https://github.com/nextcloud/server/issues/23666#issuecomment-785647870. Please try "is" instead of "matches" before the mimetype PDF setting

FadeFx commented 1 year ago

That works and now pdfs get ocr'd thank you... Strange that it did not do this without any mime type filter ...

R0Wi commented 1 year ago

Glad to help. Hope this will be fixed in NC workflowengine soon 👍