Open patdunlavey opened 1 year ago
I was able to get around this by increasing the ocr strawberry runner's timeout (to 180 seconds). I had to kill all the duplicate tesseract processes (which I did by restarting the containers) first.
I see that the timed out procs are supposed to be killed here, so I'm not sure why I was seeing a huge stack of identical tesseract processes. Might there be a bug in the kill function?
@patdunlavey you using SBR 0.5.0?
See https://github.com/esmero/strawberry_runners/commits/0.5.0 A lot of this was fixed (open issues/closed issues)
The number of attempts of failed OCR probably needs to be revisited but killing/etc should work
We are running 0.5.0. I agree that it seems like killing the old process should work, so I'll have to see if I can sort out what was going on with all those unkilled tesseract processes. As for endlessly retrying the failed process, that's clearly something to fix. Does DelayedRequeueException
seem like it might help? As I understand it (very limited), it simply puts the retry at a later point, rather than immediate. I.e. it doesn't stop the endless retrying, rather, it just lets other queue items run while the item is delayed.
For now, it seems that setting the timeout generously gets us out of this problem. So not a huge priority to deal with this from our vantage point.
Bc 0.5.0 has been evolving please double check your last commit. Also, can you make sure you are using /usr/local/bin/tesseract and not /usr/bin/tesseract (5 better versus 4, slower)
also notice that reducing the size of the exported PDF might help a lot (see the gs options there)
Ah, we are using /usr/bin/tesseract
, not /usr/local/bin/tesseract
. I'll make that change. And the -r150
argument to ghostscript should help a lot.
I think we're on the most recent commit of strawberry_runners:0.5.0.x-dev (dbac9cf07d910dcef3f26d115ced1d5bd774e377
)
Thanks!
@patdunlavey a solution would be to add a catch exception here: https://github.com/esmero/strawberry_runners/blob/0.5.0/src/Plugin/QueueWorker/AbstractPostProcessorQueueWorker.php#L420
To avoid the queue itself using the "re enqueue" on exception automatically.
A even better solution would be to have a "failed" queue to re enqueue there
that way that queue could be run / inspected manually and the issue would not be lost after 3 attempts
That's very helpful @DiegoPino. Would you like to assign this task to me to take a gander at it?
Done, thanks!
We have found that if an ocr process fails, it will
throw new RequeueException('I am not done yet. Will re-enqueu myself');
, and continue to retry and fail, creating an infinite loop, as seen in this sample watchdog output: strawberry_runner_ocr_looping.csvIn our case, I believe the issue is that the command is timing out and may be solved by increasing the ocr processor's timeout setting.
Might the general solution, in part at least, be to switch to using
DelayedRequeueException
?