argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

v3.6-rc: CronWorkflows with `timezone` and `startingDeadlineSeconds` may run at the wrong time if relisted at the wrong time #13786

Closed Joibel closed 1 month ago

Joibel commented 1 month ago

Pre-requisites

What happened? What did you expect to happen?

This bug is only in 3.6, not in 3.5.

I'm intending to fix this myself, but filing this in case anyone else discovers it before I've managed to write the tests.

In #12616 shouldOutstandingWorkflowsBeRun stopped taking account for timezone in the schedules it was analyzing. More specifically https://github.com/argoproj/argo-workflows/blob/c9b1477fd575bf06bed43ca2139f74aa3af4285c/workflow/cron/operator.go#L323 should be calling GetSchedulesWithTimezone() instead of plain GetSchedules() as we're attempting to compare with a timezone compensated value in now.

The fix is just that switch I believe, but I'll get my head around a regression test before submitting a PR.

If your Informer relist happens during the startingDeadlineSeconds in of your schedule when thought of without Timezone (in UTC or controller local?) you'll get an incorrect run of the CronWorkflow.

Version(s)

3.6.0-rc2

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: hello-world-multiple-schedules
spec:
  schedules: # v3.6 and after
    - "0 3 * * *"
    - "0 2 * * *"
  timezone: "Pacific/Auckland"   # Default to local machine timezone
  startingDeadlineSeconds: 120
  concurrencyPolicy: "Replace"      # Default to "Allow"
  successfulJobsHistoryLimit: 4     # Default 3
  failedJobsHistoryLimit: 4         # Default 1
  suspend: false                    # Set to "true" to suspend scheduling
  workflowSpec:
    entrypoint: whalesay
    templates:
      - name: whalesay
        container:
          image: docker/whalesay:latest
          command: [cowsay]
          args: ["🕓 hello world. Scheduled on: {{workflow.scheduledTime}}"]

Logs from the workflow controller

missed an execution at <Sometime> and is within StartingDeadline

Logs from in your workflow's wait container

Irrelevant
Joibel commented 1 month ago

@eduardodbr - you're welcome to take this one if you're able to have a look, otherwise I'll try and get it done on Monday. I'm AFK all weekend.

eduardodbr commented 1 month ago

Sorry @Joibel but I don't think I'll have the time to do it over the weekend. I may have capacity during the week if you end up doing other tasks

Joibel commented 1 month ago

Sorry @Joibel but I don't think I'll have the time to do it over the weekend. I may have capacity during the week if you end up doing other tasks

No problem.