aptible / supercronic

Cron for containers
MIT License
1.91k stars 115 forks source link

Supercronic is not running jobs #112

Closed coldpar closed 7 months ago

coldpar commented 2 years ago

Hi Team,

I just started using supercronic within some containers, we have some jobs that taking some time to finish like about 20 minutes and when supercronic start this kind of job it is stuck waiting for it to finish and escape the following jobs.

Example:

Job 1: 14 5 * * * /scripts/OneLongPing
Job 2: */5 * * * * echo "every 5 minutes" > /dev/null

Job 1 will be executed at 05:14 and will take 20 minutes to finish, supercronic is not executing Job 2 and stuck with the message: level=info msg="waiting for jobs to finish"

is there any way to change this behavior ?

Thanks in advance

ChristophKronberger commented 2 years ago

Have you ever tried to add a "&" to your command, so it may be executed in background. Job 1: 14 5 /scripts/OneLongPing & Job 2: /5 echo "every 5 minutes" > /dev/null

dimanzver commented 10 months ago

Is there a solution?

UserNotFound commented 10 months ago

@dimanzver et all: as far as I am aware, there is no issue as it is described. If someone can provide a crontab that I can use to repro the issue, we can look further, but as far as I can tell a long running job does not impact a different job.

The example I tried:

 * * * * * sleep 600
 */5 * * * * echo "every 5 minutes"

This issue purports that the first job, sleeping for 10 minutes, will prevent the second job (echo) from running while the sleep is running. However, I'm not able to reproduce that behavior:

image


Breaking it down:

they both run at 18:45:

root@77fa563dc83b:/# supercronic crontab
INFO[2023-12-19T18:44:32Z] read crontab: crontab
INFO[2023-12-19T18:45:00Z] starting                                      iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
INFO[2023-12-19T18:45:00Z] starting                                      iteration=0 job.command="echo \"every 5 minutes\"" job.position=1 job.schedule="*/5 * * * *"
INFO[2023-12-19T18:45:00Z] every 5 minutes                               channel=stdout iteration=0 job.command="echo \"every 5 minutes\"" job.position=1 job.schedule="*/5 * * * *"
INFO[2023-12-19T18:45:00Z] job succeeded                                 iteration=0 job.command="echo \"every 5 minutes\"" job.position=1 job.schedule="*/5 * * * *"

Then every minute that the first job should run, it does not (you can pass -overlapping to allow more to run):

WARN[2023-12-19T18:46:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (1m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
WARN[2023-12-19T18:47:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (2m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
WARN[2023-12-19T18:48:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (3m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
WARN[2023-12-19T18:49:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (4m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
WARN[2023-12-19T18:50:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (5m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"

However, you can see that the echo that runs every five minutes still runs fine:

INFO[2023-12-19T18:50:00Z] starting                                      iteration=1 job.command="echo \"every 5 minutes\"" job.position=1 job.schedule="*/5 * * * *"
INFO[2023-12-19T18:50:00Z] every 5 minutes                               channel=stdout iteration=1 job.command="echo \"every 5 minutes\"" job.position=1 job.schedule="*/5 * * * *"
INFO[2023-12-19T18:50:00Z] job succeeded                                 iteration=1 job.command="echo \"every 5 minutes\"" job.position=1 job.schedule="*/5 * * * *"

And finally, more proof that the first job is still running:

WARN[2023-12-19T18:51:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (6m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
WARN[2023-12-19T18:52:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (7m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
WARN[2023-12-19T18:53:00Z] not starting: job is still running since 2023-12-19 18:45:00 +0000 UTC (8m0s elapsed)  iteration=0 job.command="sleep 600" job.position=0 job.schedule="* * * * *"
UserNotFound commented 7 months ago

Please reopen if someone can provide a crontab (or better yet, a whole Dockerfile) that can reproduce the issue.

dimanzver commented 7 months ago

Fixed with shareProcessNamespace setting.