Closed ostasevych closed 1 year ago
Thanks for reporting. This is most likely a problem with ocrMyPdf itself. You could try to set your loglevel to 0 and reproduce the error again to get additional logs. Or you could try to execute a ocrmypdf command drectly on your backend system to see why it is complaining:
ocrmypdf input.jpg output.pdf
Thanks for reporting. This is most likely a problem with ocrMyPdf itself. You could try to set your loglevel to 0 and reproduce the error again to get additional logs. Or you could try to execute a ocrmypdf command drectly on your backend system to see why it is complaining:
ocrmypdf input.jpg output.pdf
With several tests I found that the matter in the Remove background
switch. If I turn it off, everything works fine.
I would expect that ocrMyPdf prints some error message in that case, right? If yes, this should also show up in the logs as WARNING just before the line you mentioned here
I would expect that ocrMyPdf prints some error message in that case, right? If yes, this should also show up in the logs as WARNING just before the line you mentioned here
Actually, that is, what I have had... Ocrmypdf in CLI didn't reproduce any error.
And what if you run it with the --remove-background
flag?
And what if you run it with the
--remove-background
flag?
Yup!
ocrmypdf -l ukr --remove-background --force-ocr "test_pdf_not_ocr.pdf" "test_pdf_ocr.pdf"
Scanning contents: 100%|████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14.15page/s]
1 page already has text! - rasterizing text and running OCR anyway
OCR: 0%| | 0.0/1.0 [00:07<?, ?page/s]
An exception occurred while executing the pipeline
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 192, in exec_page_sync
ocr_image, preprocess_out = make_intermediate_images(
File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 126, in make_intermediate_images
ocr_image = preprocess_out = preprocess(
File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 104, in preprocess
image = preprocess_remove_background(image, page_context)
File "/usr/lib/python3/dist-packages/ocrmypdf/_pipeline.py", line 469, in preprocess_remove_background
raise NotImplementedError("--remove-background is temporarily not implemented")
NotImplementedError: --remove-background is temporarily not implemented
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 385, in run_pipeline
exec_concurrent(context, executor)
File "/usr/lib/python3/dist-packages/ocrmypdf/_sync.py", line 274, in exec_concurrent
executor(
File "/usr/lib/python3/dist-packages/ocrmypdf/_concurrent.py", line 82, in __call__
self._execute(
File "/usr/lib/python3/dist-packages/ocrmypdf/builtin_plugins/concurrency.py", line 135, in _execute
result = future.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
NotImplementedError: --remove-background is temporarily not implemented
My suggestion is to hide that switcher.
The --remove-background
flag seems to be deactivated temporarily so we expect it to be added in future ocrMyPdf
releases. Since in older releases this still works, removing the switch completely doesn't seem to be a solution (see also my comment here).
Also I think it's not worth the effort to fiddle around with trying to detect the installed ocrMyPdf
version, since they hopefully re-add this flag in the future. I would add a note to the README for documentation purposes which I think should be enough for the moment. If anyone wants to bring in a PR for checking the ocyMyPdf
version and hiding the Remove background
switch accordingly, I'd be happy to merge it.
Warning added to README. Closing this for now.
Describe the bug The script is not working and produces error:
OCR for file /username/files/Documents/289605911_567171248133643_8374688567429083270_n.jpg not possible. Message: OCRmyPDF did not produce any output
System
To Reproduce Steps to reproduce the behavior:
to_ocr
.to_ocr
./username/files/Documents/289605911_567171248133643_8374688567429083270_n.jpg not possible. Message: OCRmyPDF did not produce any output
Screenshots If applicable, add screenshots to help explain your problem.
Server log
/username/files/Documents/289605911_567171248133643_8374688567429083270_n.jpg not possible. Message: OCRmyPDF did not produce any output