celery / django-celery-beat

Celery Periodic Tasks backed by the Django ORM
Other
1.7k stars 429 forks source link

start_time intent and additional clarity, maybe docs change #425

Open nicholasserra opened 3 years ago

nicholasserra commented 3 years ago

Quick summary

The start_time field doesn't have clear documentation on functionality, could use docs refinement

Issue

Looking at docs, issues, and code around the start_time field, i'm wondering what the actual intent is and if the code supports that intent.

Seems there's two conclusions when I look through the code and issues: option 1: start_time is a lower bound, and periodic tasks simply don't run unless the current time is after the start_time datetime. option2: start_time is the time when the task will first run, which can be accomplished with a hack using the last_run_at field

In all my testing I've found that setting a start_time does not mean that the task will first run at that specific time. For example, if I set the interval to every hour, and set the start time to 30 minutes from now, my first task run will be in one hour.

This seems supported by this issue: https://github.com/celery/django-celery-beat/issues/259

Also that issue mentions a hack to force this "trigger" behavior by setting the last_run_at time to a datetime previous to the start_time.

The code itself seems to mention that the periodic task will be triggered exactly on the start_time, for example:

https://github.com/celery/django-celery-beat/blob/master/django_celery_beat/models.py#L502

Datetime when the schedule should begin triggering the task to run

And https://github.com/celery/django-celery-beat/blob/7416e22f03b19173b5ebd5cf42a7125cd0d0f051/django_celery_beat/schedulers.py#L114

The datetime is before the start date - don't run, send a delay to retry on start_time

It seems as if the code is intending to trigger a delay so that the first run is exactly at start_time, although i've yet to be able to reproduce this.

I may be misunderstanding the comments mentioned, as they may mean to simply retry the interval check, not retry and run the task.

Does anyone have clear understanding of how this field is supposed to work? I could make some docs changes if I get a better understanding of exactly the intent here.

Thank you!

TeplAlex commented 1 year ago

Hello! Did you finally figure out what this field is for?

nicholasserra commented 1 year ago

Sort of. Just went with the assumptions above to get something working. Been a while since I looked at this code, but...

Basically, start_time alone is not guaranteed. To force a task to run at exactly the start_time desired, you'll need to use both start_time and last_run_at. Set the last_run_at to be the interval before the desired start time.

This works for how we're using the intervals, YMMV.

Basically:

(start_time=start_time, last_run_at=start_time - timedelta(**interval_delta))

or

delta = {interval_period: interval_every}
(start_time=start_time, last_run_at=start_time - timedelta(**delta))
Sugihiru commented 2 months ago

Bumping the issue, this looks like a bug, and I'm not sure why #259 was closed as this seemed relevant and the bug confirmed by @liquidpele

As I understand right now, manually setting start_time has no impact on the task