R0Wi-DEV / workflow_ocr

This is a Nextcloud Workflow App which enables you to process files via OCR on serverside.
GNU Affero General Public License v3.0
79 stars 6 forks source link

Trigger OCR if file was created or updated is not working #241

Closed lodzen closed 3 months ago

lodzen commented 7 months ago

Describe the bug

Trigger OCR if file was created or updated is not working

System

How to reproduce

Steps to reproduce the behavior: Configure the Workflow as mentioned in the manual The OCR is not triggered by Nextcloud if a file was added or modified.

Screenshots

Conversion is only working for Tags Reason why the ocr Tag is added multiple times is because the automated tagging workflow is used on top image

Server log

Please paste relevant content of your nextcloud.log file here. It might make sense to first decrease the Loglevel. Also, since the OCR process runs asynchronously, run your cron.php before copying the logs here.

Nothing visible in the server log files
R0Wi commented 7 months ago

Thanks for reporting this. Unfortunately I cannot reproduce the issue. Here is what I did:

  1. Create a fresh NC 28 Docker instance
  2. Install the Workflow OCR app from the Appstore
  3. Install ocrmypdf inside of the container
  4. Configure a personal flow like this (to ensure it will always be processed via OCR): image
  5. Upload the following test file: ocr-test.pdf via NC UI
  6. Trigger the NC cron sudo -u www-data php cron.php

Result: new file version is created as expected, text is markable inside of the document

image

Please use our troubleshooting guide and repeat your process. If you decreased your logging level like described, there must be some server logs. Those are mandatory for us to understand the problem.

Thanks for your help

lodzen commented 7 months ago

Hello,

i setup the flow exactly as in your screenshot: image

The cron is configured to run all 5min: image

Test pdf: image

Even after 15 min file was not analyzed: image

image

The difference is that its not a personal flow at my end its a global one

lodzen commented 7 months ago

Ok but even with a personal flow the Job is not executed

R0Wi commented 7 months ago

Your frontend configuration looks correct. Nevertheless, without additional backend logs it will be impossible to find the error. Like described here, please decrease your NC loglevel, repeat the process (don't forget to execute the cron manually) and post your logs here.

lodzen commented 7 months ago

I created the logs now and tried to prefilter it as best as possible flow.log nextcloud.log

R0Wi commented 7 months ago

There are two interesting lines in your nextcloud.log, one is logged by the workflowengine itself and the other is logged by this app (workflow_ocr):

{"reqId":"jiLiFxRBg9xJAAFAsV97","level":0,"time":"2024-01-25T09:32:59+00:00","remoteAddr":"79.249.68.60","user":"daniel","app":"workflowengine","method":"GET","url":"/core/preview?fileId=11909&x=250&y=250","message":"No flow configurations is going to run OCR-Datei","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","version":"28.0.1.1","data":{"app":"workflowengine","level":"0"},"id":196}

{"reqId":"jiLiFxRBg9xJAAFAsV97","level":0,"time":"2024-01-25T09:32:59+00:00","remoteAddr":"79.249.68.60","user":"daniel","app":"workflow_ocr","method":"GET","url":"/core/preview?fileId=11909&x=250&y=250","message":"Not processing event because IRuleMatcher->getFlows did not return anything","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","version":"28.0.1.1","data":{"app":"workflow_ocr"},"id":197}

I think the interesting bit here is No flow configurations is going to run OCR-Datei, which tells us that there might be some misconfiguration in your workflow. The second line tells us basically the same Not processing event because IRuleMatcher->getFlows did not return anything.

image

At the moment I have no idea why it behaves like this for you but it doesn't seem to be a general problem with Nextcloud 28 since I can't reproduce the problem. I think further investigation is needed here.

If you setup a workflow with the same conditions (file created/updated, mimetype is PDF) and you use the "Workflow Tagging", does this one work? So will it tag your PDF files correctly?

R0Wi commented 7 months ago

Some technical details:

Both log messages are produced by the workflowengine of Nextcloud, which contains the core-logic for workflow apps. In this case the getFlows method is called by our workflow_ocr app and the core logic of Nextcloud tells the app to "not run".

R0Wi commented 3 months ago

Seems to be an NC core related issue. Feel free to raise this here. Closed due to lack of feedback.