Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention

winhamwr commented 8 years ago

Sometimes, we might want multiple very-similar tasks to run (we can't just drop them or use the result from the first task), but not at the same time. Jobtastic can't currently help with this type of synchronization.

Support types of jobs where we want to prevent them from running at the same time, but we still want to run the other jobs, later.

Simultaneous Execution Prevention

Add a simultaneous_execution_prevention_timeout option that defaults to 0 (off)
When a job with that setting on hits a worker, it tries to acquires a cache lock
If it gets the lock, it goes on its merry way, making sure to release the lock when it's done or if it crashes
If it doesn't get the lock, it immediately retries the task with a short delay. Can we figure out how to separate a "I'm waiting on simultaneous lock" retry from a "the actual task needs a retry" so that a user's max_retry settings are actually used? We want to keep retrying indefinitely if something else has the simultaneous execution lock, since we can rely on that cache timeout to keep a global timeout for all potential executions of this type of task.
If herd_avoidance is >0 (active) or cache_duration is >=0 (active), we should raise an exception if someone tries to also set simultaneous_execution_prevention_timeout to >0 (active). They won't play nice together and it was almost certainly someone misunderstanding the docs.
Caveats to users based on countdown/eta/delay

This kind of thing can get you into a deadlock state with your queues. Because of the way worker prefetch_count, retry, and delay/eta/countdown interact, your retry call with a delay could block an entire pool of workers.

Let's say you have one worker pool with a concurrency of 3 and a prefetch_multiplier of 4. Then you queue up 13 jobs with simultaneous execution prevention turned on that all match via significant_kwargs. The first one to hit a worker will start running, and then the next 12 will get retried with a delay. Those will then immediately hang out in your worker pool, and since the pool only has 12 "slots" for tasks (3 concurrency times 4 prefetch_multiplier), and since the delay/eta/countdown happens at the worker pool level, the other 2 workers in your pool will have nothing to do. Even though you might be queuing up other jobs that could be run by those 2 workers, they can't get to them, because the pool has already pulled its max amount of jobs.

Could we mitigate this?

Maybe delay should be really fast, since retry does actually send things back to the broker? We'll potentially be churning through a lot of jobs that will just immediately retry after failure to acquire the lock, but that will at least let other jobs slip in between.

thenewguy commented 8 years ago

Just curious, how will you implement the locking strategy without causing the project to require a specific backend? I partially copied a task mixin that implements locking for django tasks to achieve a similar purpose https://github.com/PolicyStat/jobtastic/issues/57#issuecomment-249306075

winhamwr commented 8 years ago

@thenewguy thanks to #63, we now have a pluggable cache backend. The goal is for anyone using memcached or redis to have out of the box support for the locking strategy. Others might need to write a different cache backend, though.

thenewguy commented 4 years ago

Issue https://github.com/PolicyStat/jobtastic/issues/83 is helpful here

PolicyStat / jobtastic

Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

Simultaneous Execution Prevention

Caveats to users based on countdown/eta/delay

Could we mitigate this?