workflow_ocr not converting

Nocturna22 commented 1 year ago

Hello :)

Nextcloudversion: 25.0.3

Workflow OCR version: 1.25.2

Somehow the workflow i created does not work. I created a new workflow at the Adminsection.

Settings: Screenshot 2023-01-21 074922

After that, i uploaded 2 Files. 1 JPG & 1 PNG. None of them were converted after i entered this command: sudo -u www-data /usr/bin/php cron.php

How does cron.php handle the new conversations? Does it add them to the end of the list? Because im currently detecting Faces using the "recognize" nextcloudapp (i think it uses tesseract) and its taking a loooong time .-.

I tested the backend ocrmypdf with and without root with the following command: ocrmypdf --image-dpi 300 ~/10.jpg ~/myfile.pdf It converts my files everytime.

I don't know what to do now or how to troubleshoot...

Greetings

R0Wi commented 1 year ago

Think you've set your conditions wrong. A file cannot be an image AND a pdf at the same time ;-)

Our README might help you getting you there. Let me know if that works

Nocturna22 commented 1 year ago

Thanks for the quick answer :) First i felt very stupid xD Becaus i thougth, that these are seperate jobs i configured. Then i was happy because i thought my problem would be solved as simple as that... but unfortunately that didn't work either. I have read the README more than once xD I have now tried the non-admin variant for test purposes. It did not work there.

I have tested most of the criteria. e.g. request time between 12:00 and 11:59 should really work. Then I selected the OCR mode "Force OCR" and uploaded more than 100 documents to be sure. After executing the cron.php (which took longer than usual (much longer), so it must be doing something) I still didn't have any PDF documents in the same folder. I also checked this in the terminal. After that I searched the whole server with several methods for pdf's, but only found the ones that were already there. I don't know what else I can do. There is nothing in the protocol either.

Greetings from the Schwarzwald ;)

EDIT: Okay, the big uplaod and cronjob did, in fact, produce something... But unfortunately no PDF's. Just errors

Fehler | workflow_ocr | OCR for file /8BitBrainz/files/12366/IMG_20210707_175631.jpg not possible. Message: OCRmyPDF did not produce any output

R0Wi commented 1 year ago

Ok here are a few things I'd suggest:

Set your NC server loglevel temporarily to DEBUG (0) to get some extra logs
Setup a workflow which you think should add the file to the OCR processing queue when uploading a new file
Upload a new file matching the criteria
Check your nextcloud.log (or use the logreader app). You should see at least one line starting with "Adding file to jobqueue: "

Depending on if you can see such a line like mentioned in 4., you can go on by checking if you can see an appropriate entry in the database oc_jobs table and try to run the cron.php again. If you can't see such a log message then the file isn't added to OCR processing queue for some reason

R0Wi commented 1 year ago

EDIT: Okay, the big uplaod and cronjob did, in fact, produce something... But unfortunately no PDF's. Just errors Fehler | workflow_ocr | OCR for file /8BitBrainz/files/12366/IMG_20210707_175631.jpg not possible. Message: OCRmyPDF did not produce any output

Okay but it seems that at least your setup is correct now. Please either decrease your loglevel or try to process the file via ocrmypdf directly and see what it tells you? Otherwise to keep things a bit simpler: drop me a PM if you like.

Greets from the Bodensee ;-)

Nocturna22 commented 1 year ago

Long Stroy short:

There is a Problem with the "remove-background" function.

Thanks for the tips with debugging. That will help me a lot in the future :D

I don't know if it would be good if we move this to the PN. Maybe someone will have the same problem sometime.

I mean, you can only change the loglevel in the Nextcloud config.php, right? And that only updates when I restart apache, right? Because I have an active command running right now that will take some time. I don't know if that will get messed up when I restart apache. But the loglevel thing gave me an idea. I did not have the warnings enabled. (recently only because it was written that it is not important (I was spammed because of the unconfigured SMTP server)).

When I enabled the warnings, I got an error about the remove background function (i activated it after the switch from the adminworkflow to the userworkflow for testing and forgot about it -.-"):

The important information of this Message is "remove-background is temporarily not implemented"

OCRmyPDF succeeded with warning(s): An exception occurred while executing the pipeline concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 192, in exec_page_sync ocr_image, preprocess_out = make_intermediate_images( File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 126, in make_intermediate_images ocr_image = preprocess_out = preprocess( File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 104, in preprocess image = preprocess_remove_background(image, page_context) File "/usr/lib/python3/dist-packages/ocrmypdf/_pipeline.py", line 469, in preprocess_remove_background raise NotImplementedError("--**remove-background is temporarily not implemented")** NotImplementedError: --remove-background is temporarily not implemented """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline exec_concurrent(context, executor) File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 274, in exec_concurrent executor( File "/usr/lib/python3/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__ self._execute( File "/usr/lib/python3/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 135, in _execute result = future.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception NotImplementedError: --remove-background is temporarily not implemented,

I forgot to say: The PDF's are now generated successfully after disabeling the remove-background function :D So there is a problem, but I found it only by a human error ^^

Have a nice Day!

And sorry for stealing your time .-.

Edit: You definitely need to add a "buy me a coffee" button ;D

R0Wi commented 1 year ago

Thanks for the tips with debugging. That will help me a lot in the future :D

Hopefully you won't need it in the future :smile_cat:

I forgot to say: The PDF's are now generated successfully after disabeling the remove-background function :D

Glad to hear that things are working now. I think you're might be hitting https://github.com/ocrmypdf/OCRmyPDF/issues/884. I didn't have these problemes in the past since I use the official Debian packages which aren't updated very regularly. So falling back to a ocrmypdf-version prior to v13 might also fix it.

And sorry for stealing your time .-.

No worries, like I said, I'm glad to help :rocket: Have a nice day, too :+1:

R0Wi commented 1 year ago

Edit: You definitely need to add a "buy me a coffee" button ;D

Button added :smile: https://www.buymeacoffee.com/R0Wi

Nocturna22 commented 1 year ago

Glad to hear that things are working now. I think you're might be hitting ocrmypdf/OCRmyPDF#884. I didn't have these problemes in the past since I use the official Debian packages which aren't updated very regularly. So falling back to a ocrmypdf-version prior to v13 might also fix it.

I will have a look there :)

Button added 😄 https://www.buymeacoffee.com/R0Wi

Somehow my life is peppered with errors -.-" But i WILL buy you a coffee when this error is gone xD

jk329

R0Wi-DEV / workflow_ocr

workflow_ocr not converting #176