funkyfuture / deck-chores

A job scheduler for Docker containers, configured via labels.
https://deck-chores.readthedocs.io/
ISC License
196 stars 27 forks source link

Add a grace time option to solve Missed execution #130

Open Kelivan opened 1 year ago

Kelivan commented 1 year ago

I encountered the same problem on one of the servers.

task_1        | 2023-10-01 02:00:04,929|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-01 02:00:00+03:00.
task_1        | 2023-10-03 02:00:05,236|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-03 02:00:00+03:00.
task_1        | 2023-10-05 02:00:02,213|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-05 02:00:00+03:00.
task_1        | 2023-10-07 02:00:03,978|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-07 02:00:00+03:00.
task_1        | 2023-10-08 02:00:01,656|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-08 02:00:00+03:00.
task_1        | 2023-10-11 02:00:05,651|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-11 02:00:00+03:00.
task_1        | 2023-10-12 02:00:01,836|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-12 02:00:00+03:00.
task_1        | 2023-10-14 02:00:04,225|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-14 02:00:00+03:00.
task_1        | 2023-10-15 02:00:01,409|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-15 02:00:00+03:00.
task_1        | 2023-10-18 02:00:06,149|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-18 02:00:00+03:00.

This task makes backups of PostgreSQL

labels:
  cron.db-dump.command: bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$$(date +"%Y%m%d")_portal.sql.gz"
  cron.db-dump.cron: "* * * * * 2 0 0"

Nothing interesting in the debug output

task_1        | 2023-10-14 02:00:04,107|DEBUG   |Looking for jobs to run
task_1        | 2023-10-14 02:00:04,199|DEBUG   |Next wakeup is due at 2023-10-15 02:00:00+03:00 (in 86395.802876 seconds)
task_1        | Run time of job "db-dump (trigger: cron[year='*', month='*', day='*', week='*', day_of_week='*', hour='2', minute='0', second='0'], next run at: 2023-10-15 02:00:00 MSK)" was missed by 0:00:04.183147
task_1        | 2023-10-14 02:00:04,225|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-14 02:00:00+03:00.
task_1        | 2023-10-15 02:00:01,400|DEBUG   |Looking for jobs to run
task_1        | 2023-10-15 02:00:01,405|DEBUG   |Next wakeup is due at 2023-10-16 02:00:00+03:00 (in 86398.594398 seconds)
task_1        | Run time of job "db-dump (trigger: cron[year='*', month='*', day='*', week='*', day_of_week='*', hour='2', minute='0', second='0'], next run at: 2023-10-16 02:00:00 MSK)" was missed by 0:00:01.406026
task_1        | 2023-10-15 02:00:01,409|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-15 02:00:00+03:00.
task_1        | 2023-10-16 02:00:00,000|DEBUG   |Looking for jobs to run
task_1        | 2023-10-16 02:00:00,006|DEBUG   |Next wakeup is due at 2023-10-17 02:00:00+03:00 (in 86399.993227 seconds)
task_1        | 2023-10-16 02:00:00,013|INFO    |portal-pg_postgresql_1: Executing 'db-dump'.
task_1        | 2023-10-16 02:00:00,381|DEBUG   |Daemon event: {'status': 'exec_start: bash -c rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +%Y%m%d)_portal.sql.gz', 'id': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'from': 'postgres:10-alpine', 'Type': 'container', 'Action': 'exec_start: bash -c rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +%Y%m%d)_portal.sql.gz', 'Actor': {'ID': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'Attributes': {'com.docker.compose.config-hash': 'f717fcd630129addf59ec0054e05bd8957a3bcd83af7ee6c3e992edc8045a114', 'com.docker.compose.container-number': '1', 'com.docker.compose.oneoff': 'False', 'com.docker.compose.project': 'portal-pg', 'com.docker.compose.project.config_files': 'docker-compose.yml', 'com.docker.compose.project.working_dir': '/docker/portal-pg', 'com.docker.compose.service': 'postgresql', 'com.docker.compose.version': '1.29.2', 'cron.db-dump.command': 'bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +"%Y%m%d")_portal.sql.gz"', 'cron.db-dump.cron': '* * * * * 2 0 0', 'execID': '1a0edf89a99ea5a850b204226ca7972bfbe684dd0fd2b936ffa89410c92bf93e', 'image': 'postgres:10-alpine', 'name': 'portal-pg_postgresql_1'}}, 'scope': 'local', 'time': 1697410800, 'timeNano': 1697410800379213061}
task_1        | 2023-10-16 02:05:56,084|DEBUG   |Daemon event: {'status': 'exec_die', 'id': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'from': 'postgres:10-alpine', 'Type': 'container', 'Action': 'exec_die', 'Actor': {'ID': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'Attributes': {'com.docker.compose.config-hash': 'f717fcd630129addf59ec0054e05bd8957a3bcd83af7ee6c3e992edc8045a114', 'com.docker.compose.container-number': '1', 'com.docker.compose.oneoff': 'False', 'com.docker.compose.project': 'portal-pg', 'com.docker.compose.project.config_files': 'docker-compose.yml', 'com.docker.compose.project.working_dir': '/docker/portal-pg', 'com.docker.compose.service': 'postgresql', 'com.docker.compose.version': '1.29.2', 'cron.db-dump.command': 'bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +"%Y%m%d")_portal.sql.gz"', 'cron.db-dump.cron': '* * * * * 2 0 0', 'execID': '1a0edf89a99ea5a850b204226ca7972bfbe684dd0fd2b936ffa89410c92bf93e', 'exitCode': '0', 'image': 'postgres:10-alpine', 'name': 'portal-pg_postgresql_1'}}, 'scope': 'local', 'time': 1697411156, 'timeNano': 1697411156066181725}
task_1        | 2023-10-16 02:05:56,102|INFO    |Command `bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +"%Y%m%d")_portal.sql.gz"` in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb finished with exit code 0.
task_1        | 2023-10-17 02:00:00,017|DEBUG   |Looking for jobs to run
task_1        | 2023-10-17 02:00:00,098|INFO    |portal-pg_postgresql_1: Executing 'db-dump'.
task_1        | 2023-10-17 02:00:00,099|DEBUG   |Next wakeup is due at 2023-10-18 02:00:00+03:00 (in 86399.901019 seconds)
task_1        | 2023-10-17 02:00:00,452|DEBUG   |Daemon event: {'status': 'exec_start: bash -c rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +%Y%m%d)_portal.sql.gz', 'id': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'from': 'postgres:10-alpine', 'Type': 'container', 'Action': 'exec_start: bash -c rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +%Y%m%d)_portal.sql.gz', 'Actor': {'ID': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'Attributes': {'com.docker.compose.config-hash': 'f717fcd630129addf59ec0054e05bd8957a3bcd83af7ee6c3e992edc8045a114', 'com.docker.compose.container-number': '1', 'com.docker.compose.oneoff': 'False', 'com.docker.compose.project': 'portal-pg', 'com.docker.compose.project.config_files': 'docker-compose.yml', 'com.docker.compose.project.working_dir': '/docker/portal-pg', 'com.docker.compose.service': 'postgresql', 'com.docker.compose.version': '1.29.2', 'cron.db-dump.command': 'bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +"%Y%m%d")_portal.sql.gz"', 'cron.db-dump.cron': '* * * * * 2 0 0', 'execID': '47fb14130a6e3054c27706c54dedab1c1789f32e720e4e94374b617de4095d00', 'image': 'postgres:10-alpine', 'name': 'portal-pg_postgresql_1'}}, 'scope': 'local', 'time': 1697497200, 'timeNano': 1697497200446706968}
task_1        | 2023-10-17 02:05:55,828|DEBUG   |Daemon event: {'status': 'exec_die', 'id': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'from': 'postgres:10-alpine', 'Type': 'container', 'Action': 'exec_die', 'Actor': {'ID': '2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb', 'Attributes': {'com.docker.compose.config-hash': 'f717fcd630129addf59ec0054e05bd8957a3bcd83af7ee6c3e992edc8045a114', 'com.docker.compose.container-number': '1', 'com.docker.compose.oneoff': 'False', 'com.docker.compose.project': 'portal-pg', 'com.docker.compose.project.config_files': 'docker-compose.yml', 'com.docker.compose.project.working_dir': '/docker/portal-pg', 'com.docker.compose.service': 'postgresql', 'com.docker.compose.version': '1.29.2', 'cron.db-dump.command': 'bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +"%Y%m%d")_portal.sql.gz"', 'cron.db-dump.cron': '* * * * * 2 0 0', 'execID': '47fb14130a6e3054c27706c54dedab1c1789f32e720e4e94374b617de4095d00', 'exitCode': '0', 'image': 'postgres:10-alpine', 'name': 'portal-pg_postgresql_1'}}, 'scope': 'local', 'time': 1697497555, 'timeNano': 1697497555808353975}
task_1        | 2023-10-17 02:05:55,861|INFO    |Command `bash -c "rm -f /dump/* && pg_dump -O -Fp -b -U postgres portal | gzip > /dump/$(date +"%Y%m%d")_portal.sql.gz"` in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb finished with exit code 0.
task_1        | 2023-10-18 02:00:06,035|DEBUG   |Looking for jobs to run
task_1        | 2023-10-18 02:00:06,122|DEBUG   |Next wakeup is due at 2023-10-19 02:00:00+03:00 (in 86393.880049 seconds)
task_1        | Run time of job "db-dump (trigger: cron[year='*', month='*', day='*', week='*', day_of_week='*', hour='2', minute='0', second='0'], next run at: 2023-10-19 02:00:00 MSK)" was missed by 0:00:06.117066
task_1        | 2023-10-18 02:00:06,149|WARNING |Missed execution of db-dump in container 2264e9a1b35e784984d050b6f79496b425065761e1aa74bcfa25422ef720dcbb at 2023-10-18 02:00:00+03:00.

I think it would be a good idea to add a grace time option to the variables.

Originally posted by @Kelivan in https://github.com/funkyfuture/deck-chores/issues/36#issuecomment-1769017008

funkyfuture commented 3 weeks ago

are a lot of jobs invoked at 2:00? first you should figure out why the jobs are missed so far off.

glancing through apscheduler's docs i didn't find anything about a grace time option anyway.