agronholm / apscheduler

Task scheduling library for Python
MIT License
6.3k stars 712 forks source link

Jobs scheduled to run every 1 minute gets missed if previous job takes longer; max_instances is set to higher number #977

Closed pchatterjee-cfx closed 1 month ago

pchatterjee-cfx commented 1 month ago

Things to check first

Version

3.10.4

What happened?

Running a stress test script that posts 1500 events to a webhook. The job is configured to repeat every minute.

            scheduler = BlockingScheduler(timezone=utc)
            cronExpression = f"*/{minutes} * * * *" # Every minute
            print('Scheduled job for Cron expression: {}'.format(cronExpression))
            scheduler.add_job(alertgen.repeat, trigger=CronTrigger.from_crontab(cronExpression),
                                max_instances=10, args=(alerts, minutes))

The posting of events takes a few seconds more than a minute. The next execution of the job does not start immediately; instead it is scheduled to run after a minute. This means that no events are posted to the webhook in the interim 1-minute.

I expect the next task to start immediately as misfire_grace_time not set and the task is configured to run every minute at the start of the minute

How can we reproduce the bug?

Add a job to run at the start of every minute. Add a delay in the job so that it takes longer than a minute to finish.

I have searched the existing issues and didn't find my bug already reported there

This is not true as I found a few reported issues but there is no clear answer in those discussions

agronholm commented 1 month ago

Why would you expect the scheduled task to run immediately, instead of the scheduled time which is the nearest minute with seconds=0? What cron implementation does that?

As for misfire_grace_time, have you read the documentation on that? It specifically says that it only determines whether a job that would be started later than its scheduled time is still allowed to run or not.

pchatterjee-cfx commented 1 month ago

My expectation of trigger time is this

image

agronholm commented 1 month ago

Yes, so once you add a job, it will wait until the next full minute to run it, just like in the picture.

pchatterjee-cfx commented 1 month ago

Yes, so once you add a job, it will wait until the next full minute to run it, just like in the picture.

That's not the problem that I am referring to. After the first job is triggered say at 00:00:00 and it takes longer than a minute say 65 seconds, the next job is only run at 00:02:00. I would expect the next job to run at 00:01:05 since the previous job took 5 seconds longer to finish. So the trigger at 00:01:00 is missed.

agronholm commented 1 month ago

So what happens here is that the scheduler tries to start the next run of the job, and sees that there's another one still going. By default, the max_instances parameter is set to 1, so the new run is skipped.

You could get around this by setting max_instances=2 and then using some kind of locking to ensure that the second run of the job waits until the first one is done.

pchatterjee-cfx commented 1 month ago

Ah ok. I did use max_instances=2, also max_instances=10 but it was still skipping that trigger. Let me add a lock and give it a try ...

agronholm commented 1 month ago

If that for some reason doesn't work, then please construct a minimum workable example that demonstrates the issue.

pchatterjee-cfx commented 1 month ago

Interesting. I created a CronTrigger to run the following function that sleeps for 65 seconds:

def sleep(seconds : int):
    print(f'Sleeping for {seconds} seconds ...')
    time.sleep(seconds)
    print('AWAKE')

And scheduled to run at every minute:

            scheduler = BlockingScheduler(timezone=utc)
            cronExpression = f"*/1 * * * *" # Every minute
            print('Scheduled job for Cron expression: {}'.format(cronExpression))
            scheduler.add_job(sleep, trigger=CronTrigger.from_crontab(cronExpression),
                                max_instances=10, args=(65,))

And got the expected output

Scheduled job for Cron expression: /1 * Sleeping for 65 seconds ... Sleeping for 65 seconds ... AWAKE Sleeping for 65 seconds ... AWAKE

It seems that the next job DOES START at the correct time. My apologies for the invalid alarm. I need to check why my bulk job does not start after the scheduled time is missed!!!

pchatterjee-cfx commented 1 month ago

Confirmed its working fine. I am running 5 simultaneous scripts and the overlapping prints gave the impression that a trigger is missed. I ran 2 scripts simultaneously and they are not missing any trigger.

==> 02a94b78-d3c2-452f-ad75-33721d440512.log <== Initialized AlertGen Scheduled job for Cron expression: /1 *

==> 0885b700-d39b-4df5-b545-844227a33872.log <== Initialized AlertGen Scheduled job for Cron expression: /1 * Sending alerts via webhook : 1500

==> 02a94b78-d3c2-452f-ad75-33721d440512.log <== Sending alerts via webhook : 1500

==> 02a94b78-d3c2-452f-ad75-33721d440512.log <== Sending alerts via webhook : 1500 Successfully sent 1500 alerts in 0:01:16.292995 seconds, Time:2024-10-17 08:54:16 Next batch after 1 minutes

Its clear in the log 02a94b78-d3c2-452f-ad75-33721d440512.log that the next job is started even when the previous job finishes later at Time:2024-10-17 08:54:16.

Thanks again for patiently answering my queries and my apologies once again for the false alarm.

pchatterjee-cfx commented 1 month ago

Closing as invalid