Closed KaiserKarel closed 2 years ago
So last night I django-cron tried to to a couple of consecutive runs with my lock still turned on. If you've got jobs that absolutely cannot run in parallel, I recommend writing your own lock.
Same was happening to me. It looks like we had small maxmemory in redis (1GB) and locks were evicted because of full used_memory.
I am not sure if I understood @AbeIgnocious 's solution right but I was missing the option that the Lock have not been created yet, so I solved this way:
functions.py
@transaction.atomic
def set_lock(name):
# lock = CronJobLock.objects.select_for_update().filter(job_name=name, locked=False).update(locked=True)
lock, created = CronJobLock.objects.get_or_create(job_name=name) # default 'locked' attribute is False
if lock.locked:
print 'ERROR: Lock was True while cron run was initiated.'
raise RuntimeError('Lock was True while cron run was initiated.')
else:
lock.locked = True
lock.save()
print "Lock locked on " + name
@transaction.atomic
def release_lock(name):
lock = CronJobLock.objects.select_for_update().filter(job_name=name, locked=True).update(locked=False)
if not lock:
print 'WARNING: Lock was False while cron run was ended.'
raise RuntimeError('Lock was False while cron run was ended.')
else:
print "Lock released on " + name
models.py
class CronJobLock(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
job_name = models.CharField(max_length=64, unique=True)
locked = models.BooleanField(default=False)
I am not sure why this is necessary but if I got the documention correctly, Django Cache must be enabled for the project and that is not our case so I used this walkaround. I have to call set and release functions at the start and at the end of every cron job that cannot run parallely.
Appears something broke in a recent update. We sent dozens of emails to the same users telling them their account would expire.
What caching backend are you guys using? It could be that you're using LocMemCache which is per-process.
I ran into a similar issue recently. In our case the job gets triggered from multiple web servers on production exactly at the same time. Our caching backend is django.core.cache.backends.db.DatabaseCache and all web servers share a common cache database (SQL Server). The issue with DatabaseCache is that it doesn't have database level unique key constraints for cache key. Django handles cache key uniqueness through code so we often ran into race condition.
We ended up creating a custom lock.
class CustomCronJobLock(DjangoCronJobLock):
def lock(self):
try:
with transaction.atomic():
CronJobLock.objects.create(job_name=self.job_name)
return True
except:
return False
def release(self):
with transaction.atomic():
CronJobLock.objects.filter(job_name=self.job_name).delete()
class CronJobLock(models.Model): job_name = models.CharField(max_length=255, primary_key=True)
job_name column has primary key constraint in the database.
Today I had django-cron initiate a runcron, and during that runtime, start another instance and another one. Seems like there is something wrong with the cache lock?
Looking at my cronjob logs, it could also be that django-cron ended the job, before the actual end of the job.
My code looks something like this:
It seems that after do_something_for_5_mins(), django-cron removed the lock and allowed another instance to run?
I thought I'd attempt a custom lock:
and
These locks can be used to avoid parallel runs