collectiveidea / delayed_job

Database based asynchronous priority queue system -- Extracted from Shopify
http://groups.google.com/group/delayed_job
MIT License
4.81k stars 955 forks source link

Avoid rescuing WorkerTimeout when job times out #1181

Closed diogoosorio closed 9 months ago

diogoosorio commented 1 year ago

When provided an error class argument, Ruby's Timeout library allows the block to catch the provided exception.

This means that when the Timeout library sends the thread.raise signal to the running job, depending on the execution state of the thread (e.g. if when the signal is received, it's wrapped in a rescue StandardError statement), the job isn't actually halted.

At the same time, both the mongoid and activerecord backends assume that jobs that are past their max_run_time threshold (even if they were locked by another worker) are valid candidates to be picked up. This ultimately leads to the same job to be potentially run in multiple workers at the same time.

This commit proposes a change in the way the library interacts with the Timeout module, by not providing a specific exception class. This triggers a different code flow within the library which prevents the job to be able to swallow the exception.

Here's a repo with a minimalistic example on how to replicate the problem: https://github.com/diogoosorio/delayed-job-timeout-example

diogoosorio commented 1 year ago

The intention is for this to be a solution for #1180

albus522 commented 9 months ago

This reintroduces the issue it fixes, so no this doesn't work. Previously you could rescue the timeout within the block due to nesting timeouts. The fix probably belongs in the timeout library to run the same code path when the provided error inherits from the Timeout::Error.

diogoosorio commented 9 months ago

tbh I'd have to regain context about what I did here, but given your comment I doubt it's worth the effort.

I'll close the PR / issue, thank you for your time/review.