celery / django-celery-beat

Celery Periodic Tasks backed by the Django ORM
Other
1.69k stars 429 forks source link

Less frequent tasks might not be started at all #634

Open nijel opened 1 year ago

nijel commented 1 year ago

Summary:

Task scheduled at a longer interval than application restart will never be executed.

Exact steps to reproduce the issue:

  1. Create a task which should be executed based on IntervalSchedule (lets say once in a week).
  2. Restart Celery Beat before the chosen interval happens.
  3. Repeat the restarts and see that the task is never due.

This has happened to us in CD setup, where deploys were more frequent than task interval.

Detailed information

Just created PeriodicTask has last_run_at = None. Upon creating ModelEntry from it, it is updated to now(), but not saved to the database:

https://github.com/celery/django-celery-beat/blob/8f9fd1b877ffc140d0ba654fad55a9406343bf0d/django_celery_beat/schedulers.py#L86-L87

This state is kept in memory only as long as the Beat process is running. Upon restart, this information is lost and counting the interval starts again.

I think this is different from native Celery Beat – it does store last execution time for new tasks in shelve, so the timestamp is kept over restarts.

auvipy commented 1 year ago

I am open to exploring the possibility of saving it to DB or any improvement that works and don't create any regression

nijel commented 1 year ago

Behavior consistent with native Beat would be auto_now_add=True on last_run_at field – it also stores the current timestamp on task creation. But I guess it would have impact on other code paths as well…

The workaround for me was switching to the crontab scheduler, as that always fires regardless of when the task was last triggered.