celery / django-celery-beat

Celery Periodic Tasks backed by the Django ORM
Other
1.66k stars 428 forks source link

Celery Beat Crashing at the End of Daylight Savings #604

Open polarmt opened 1 year ago

polarmt commented 1 year ago

Summary:

Exact steps to reproduce the issue:

  1. Set CELERY_TIMEZONE equal to America/Los_Angeles
  2. Run the Celery beat such that an IntervalSchedule task runs between 1 AM and 2 PM on the day that daylight savings ends. In our case, this was 2022-11-06.
  3. Celery beat will crash with the following traceback:
[INFO] [2022-11-06 01:18:52,155] [18813] [schedulers.schedule:356] - DatabaseScheduler: Schedule changed.
Process Beat:
Traceback (most recent call last):
  File "/path/to/virtualenv/lib/python3.7/site-packages/billiard/process.py", line 327, in _bootstrap
    self.run()
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/beat.py", line 707, in run
    self.service.start(embedded_process=True)
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/beat.py", line 631, in start
    interval = self.scheduler.tick()
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/beat.py", line 329, in tick
    self.populate_heap()
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/beat.py", line 303, in populate_heap
    is_due, next_call_delay = entry.is_due()
  File "/path/to/virtualenv/lib/python3.7/site-packages/django_celery_beat/schedulers.py", line 135, in is_due
    return self.schedule.is_due(last_run_at_in_tz)
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/schedules.py", line 171, in is_due
    rem_delta = self.remaining_estimate(last_run_at)
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/schedules.py", line 137, in remaining_estimate
    self.maybe_make_aware(self.now()), self.relative,
  File "/path/to/virtualenv/lib/python3.7/site-packages/celery/schedules.py", line 76, in now
    return (self.nowfun or self.app.now)()
  File "/path/to/virtualenv/lib/python3.7/site-packages/django_celery_beat/models.py", line 175, in <lambda>
    nowfun=lambda: make_aware(now())
  File "/path/to/virtualenv/lib/python3.7/site-packages/django_celery_beat/utils.py", line 27, in make_aware
    value = timezone.make_aware(value, timezone.get_default_timezone())
  File "/path/to/virtualenv/lib/python3.7/site-packages/django/utils/timezone.py", line 270, in make_aware
    return timezone.localize(value, is_dst=is_dst)
  File "/path/to/virtualenv/lib/python3.7/site-packages/pytz/tzinfo.py", line 363, in localize
    raise AmbiguousTimeError(dt)
pytz.exceptions.AmbiguousTimeError: 2022-11-06 01:18:52.418857

An easier way to reproduce this issue is by running the following script in the shell (python manage.py shell):

from datetime import datetime
from django_celery_beat.utils import make_aware

dt = datetime(2022, 11, 6, 1, 36, 0)
make_aware(dt)

Detailed information

We were able to pinpoint the root cause after investigating the issue.

  1. The IntervalSchedule will define the nowfun function asmake_aware (Source)
  2. The now() function inside celery/schedules.py will call the nowfun, subsequently make_aware (Source)
  3. The make_aware function will call the make_aware function from django/utils/timezone.py without passing in is_dst as a parameter (Source)
  4. The second make_aware function will call localize from pytz/timeinfo.py (Source)
  5. The localize will find two timezones associated with the datetime (Source)
  6. Since two timezones were found and is_dst is undefined, the function will throw an AmbiguousTimeError (Source)

We have many Celery beat tasks running throughout the day. Because of this exception, we were not even able to start the Celery beat scheduler creating an outage for two hours. We had alerts to catch the issue as soon as it occurred, but there would have been a longer outage had we not caught the issue earlier.

Possible Solutions

We were wondering if there is a way to configure the Celery beat such that the IntervalSchedule tasks will run despite the fact that the time may be in two timezones. A change like the following would prevent such problems from occurring:

value = timezone.make_aware(value, timezone.get_default_timezone(), is_dst=bool(time.localtime().tm_isdst))

Line to change

This might require some changes in the remaining_estimate. We can remove the following lines such that the time elapsed between the two timestamps is accurate:

    if str(start.tzinfo) == str(now.tzinfo) and now.utcoffset() != start.utcoffset():
        # DST started/ended
        start = start.replace(tzinfo=now.tzinfo)

Line to change

If last_run_at is 1:15 AM PDT and now is 1:30 AM PST, then this should return 1 hour 15 minutes instead of 15 minutes.

Note

I am aware of https://github.com/celery/django-celery-beat/issues/285. However, this does not solve our issue. Our issue is that the Celery beat scheduler will crash and fail to send tasks for an entire two hours due to the behavior of make_aware.

auvipy commented 1 year ago

that celery version is unsupported. also latest release is 2.4

polarmt commented 1 year ago

Are you certain that this is still not an issue? All of the code that I posted under Detailed Information is referenced from the current master branch other than the Django code:

Is there code that prevents this call stack from running?

auvipy commented 1 year ago

btw, It seems I prematurely concluded the issue. my apologies. can you create a PR with possible fix suggested?

polarmt commented 1 year ago

Thanks for reopening! I have submitted the PRs here:

https://github.com/celery/django-celery-beat/pull/605 https://github.com/celery/celery/pull/7901