Betterment / delayed

a multi-threaded, SQL-driven ActiveJob backend used at Betterment to process millions of background jobs per day
MIT License
154 stars 8 forks source link

Fix spin-loop/cleanup failure mode within run loop #42

Closed smudge closed 3 months ago

smudge commented 3 months ago

This ensures that exceptions raised in thread callback hooks are rescued and properly mark jobs as failed.

This is also a good opportunity to change the num argument (of work_off(num)) to mean number of jobs (give or take a few due to max_claims), not number of iterations. Previously (before threading was introduced) I think it meant number of jobs (though jobs and iterations were 1:1). I would not have done this before the refactor, because there was no guarantee that one of success or failure would be incremented (the thread might crash for many reasons). Now, we only increment success and treat total - success as the "failure" number when we return from the method.

Fixes #23 and #41

This is also a prereq for a resolution I'm cooking up for #36