Open kylegibson opened 8 years ago
I think this is actually #9. I'm going to write up a potential solution to this problem on that issue, based on our discussion.
This is actually different from #9. I was wrong. Because of the way we do lock.acquire in both memcached and redis with no timeout possibility, if for some reason the lock.release doesn't happen in another thread/process, the lock aquisition will wait forever.
I think that's probably low impact, since it would require the python code to have core dumped or the call to lock.release()
or self.cache.delete
to have failed because of network or memcached/redis problems. In that case, the other processes waiting on the lock would be in an infinite loop.
Probably the behavior we want is for those other processes to fail very quickly if they're stuck waiting for a lock. thundering_herd_timeout
is about how long we should wait for a queued task to complete, which would depend on how busy your workers are and how long the task takes to run.
This timeout is more about how long you would expect it to take for a process to run through the apply_async
method, which does no work other than checking the cache and possibly queueing a task. My 20-minute reporting/analytics/backup task should fail in a matter of seconds waiting on the task to actually be queued. It shouldn't wait in a blocking (probably web) process.
I created #68 to document how I think a feature might look to do simultaneous execution prevention.
See here: https://github.com/PolicyStat/jobtastic/blob/master/jobtastic/cache/base.py#L91
The
timeout
doesn't mean thatadd
will block and wait. It just means that if the key can be set, it will be set with that timeout. So if the key already exists, this results in a busy loop.