Closed timothydereuse closed 3 years ago
Hi Timothy!
How are you doing?
In principal, Calamari should be able to run without any parallelization. The commands required should indeed be:
predictor.data.params.pre_proc.run_parallel = False
predictor.data.params.post_proc.run_parallel = False
I will check, probably tomorrow, why this does not work and provide an update asap!
Calamari 2.1.3 should fix this. Please let me know if this works for you!
Took me a while to get to testing it thoroughly, but it looks like this did the trick! Thanks so much for doing this so quickly, Christoph.
You are welcome!
I use Calamari's Python API as part of a larger application that schedules tasks using the Celery queue system. I've been upgrading our environment to use Calamari 2.1 (from 1.0), and I've been getting an error because of parallel processing in Calamari, as Celery tasks are not permitted to use the Python multiprocessing library.
The use of Calamari in our code is just these few lines (summarized here):
This works perfectly fine when running not as a Celery task. When running as a Celery task, though, we get this error:
This was not an issue in Calamari 1.0, though I am not sure exactly what changed. In any case, speed is not a priority for us, and we do not need parallel processing of text lines. I have been trying to figure out if there is a way to set parameters within Calamari such that prediction does not use the multiprocessing library. In particular, https://github.com/Calamari-OCR/calamari/issues/263#issuecomment-860047394 mentions a way to disable the parallel pipeline, which I tried with these lines of code before running
predict_raw()
:But this resulted in the same error. Is it possible at all to run predictions in Calamari without any use of multiprocessing? (I have also been digging through the tfaip source, which led me to believe that this might be possible, but I am not sure if I have to fork Calamari to make it happen or if there's an easier way I'm overlooking.)