esmero / strawberry_runners

A post processing Drupal 8/9 module for Strawberryfield dispatched events
GNU Lesser General Public License v3.0
3 stars 2 forks source link

OCR timeout can cause infinite loop? #75

Open patdunlavey opened 1 year ago

patdunlavey commented 1 year ago

We have found that if an ocr process fails, it will throw new RequeueException('I am not done yet. Will re-enqueu myself');, and continue to retry and fail, creating an infinite loop, as seen in this sample watchdog output: strawberry_runner_ocr_looping.csv

In our case, I believe the issue is that the command is timing out and may be solved by increasing the ocr processor's timeout setting.

Might the general solution, in part at least, be to switch to using DelayedRequeueException?

patdunlavey commented 1 year ago

I was able to get around this by increasing the ocr strawberry runner's timeout (to 180 seconds). I had to kill all the duplicate tesseract processes (which I did by restarting the containers) first.

I see that the timed out procs are supposed to be killed here, so I'm not sure why I was seeing a huge stack of identical tesseract processes. Might there be a bug in the kill function?

DiegoPino commented 1 year ago

@patdunlavey you using SBR 0.5.0?

See https://github.com/esmero/strawberry_runners/commits/0.5.0 A lot of this was fixed (open issues/closed issues)

The number of attempts of failed OCR probably needs to be revisited but killing/etc should work

patdunlavey commented 1 year ago

We are running 0.5.0. I agree that it seems like killing the old process should work, so I'll have to see if I can sort out what was going on with all those unkilled tesseract processes. As for endlessly retrying the failed process, that's clearly something to fix. Does DelayedRequeueException seem like it might help? As I understand it (very limited), it simply puts the retry at a later point, rather than immediate. I.e. it doesn't stop the endless retrying, rather, it just lets other queue items run while the item is delayed.

For now, it seems that setting the timeout generously gets us out of this problem. So not a huge priority to deal with this from our vantage point.

DiegoPino commented 1 year ago

Bc 0.5.0 has been evolving please double check your last commit. Also, can you make sure you are using /usr/local/bin/tesseract and not /usr/bin/tesseract (5 better versus 4, slower)

also notice that reducing the size of the exported PDF might help a lot (see the gs options there)

image

patdunlavey commented 1 year ago

Ah, we are using /usr/bin/tesseract, not /usr/local/bin/tesseract. I'll make that change. And the -r150 argument to ghostscript should help a lot.

I think we're on the most recent commit of strawberry_runners:0.5.0.x-dev (dbac9cf07d910dcef3f26d115ced1d5bd774e377)

Thanks!

DiegoPino commented 1 year ago

@patdunlavey a solution would be to add a catch exception here: https://github.com/esmero/strawberry_runners/blob/0.5.0/src/Plugin/QueueWorker/AbstractPostProcessorQueueWorker.php#L420

Similar to: https://github.com/esmero/strawberry_runners/blob/dbac9cf07d910dcef3f26d115ced1d5bd774e377/src/Plugin/QueueWorker/AbstractPostProcessorQueueWorker.php#L395-L417

To avoid the queue itself using the "re enqueue" on exception automatically.

A even better solution would be to have a "failed" queue to re enqueue there

that way that queue could be run / inspected manually and the issue would not be lost after 3 attempts

patdunlavey commented 1 year ago

That's very helpful @DiegoPino. Would you like to assign this task to me to take a gander at it?

DiegoPino commented 1 year ago

Done, thanks!