argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.06k stars 3.2k forks source link

v3.2.3 daemon steps are not restarted when you retry a failed workflow #7095

Closed yoandl closed 2 years ago

yoandl commented 3 years ago

Summary

What happened/what you expected to happen?

When you retry a failed workflow with a daemon step, the workflow does not re-start the daemon step. You should be able to retry a failed workflow and have the previous database up for debuging process if you have idempotent steps.

What version of Argo Workflows are you running? 3.2.3

Diagnostics

This yaml contain an error to try the retry feature.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: daemon-step-
spec:
  entrypoint: daemon-example
  templates:
  - name: daemon-example
    steps:
    - - name: influx
        template: influxdb

    - - name: init-database
        template: influxdb-client
        arguments:
          parameters:
          - name: cmd
            value: curl -XPOST 'http://{{steps.influx.ip}}:8086/query' --data-urlencode "q=CREATE DATABASE mydb"

    - - name: producer-1
        template: influxdb-client
        arguments:
          parameters:
          - name: cmd
            value: for i in $(seq 1 20); do curl -XPOST 'http://{{steps.influx.ip}}:8086/write?db=mydb' -d "cpu,host=server01,region=uswest load=$i" ; sleep .5 ; done
      - name: producer-2
        template: influxdb-client
        arguments:
          parameters:
          - name: cmd
            value: for i in $(seq 1 20); do curl -XPOST 'http://{{steps.influx.ip}}:8086/write?db=mydb' -d "cpu,host=server02,region=uswest load=$((RANDOM % 100))" ; sleep .5 ; done
      - name: producer-3
        template: influxdb-client
        arguments:
          parameters:
          - name: cmd
            value: curl -XPOST 'http://{{steps.influx.ip}}:8086/write?db=mydb' -d 'cpu,host=server03,region=useast load=15.4'

    - - name: consumer
        template: influxdb-client
        arguments:
          parameters:
          - name: cmd
            value: curls --silent -G http://{{steps.influx.ip}}:9999/query?pretty=true --data-urlencode "db=mydb" --data-urlencode "q=SELECT * FROM cpu" 

  - name: influxdb
    daemon: true
    container:
      image: influxdb:1.2
      readinessProbe:
        httpGet:
          path: /ping
          port: 8086
        initialDelaySeconds: 5
        timeoutSeconds: 1
      command: ["/entrypoint.sh", "influxd"]

  - name: influxdb-client
    inputs:
      parameters:
      - name: cmd
    container:
      image: appropriate/curl:latest
      command: ["sh", "-c"]
      args: ["{{inputs.parameters.cmd}}"]

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary kub 1.18 eks on aws


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec commented 3 years ago

Potential causes:

6875479db fix: Daemond status stuck with Running (#6742) a147f178d fix(controller): Set finishedAt for workflow with Daemon steps (#6650) 72ee1cce9 fix: Set daemoned nodes to Succeeded when boudary ends (#5421)

alexec commented 3 years ago

@yoandl was this working in a previous version please? I.e. is it a regression?

yoandl commented 3 years ago

@alexec I'm not totally sure but i tried this in version 3.1.9 and it wasn't working. I'm sure this not work in 3.2.0RC1 / 3.2.1 /3.2.2 / 3.2.3.

sarabala1979 commented 2 years ago

I am not able to look into it soon. I will look into after holiday

sarabala1979 commented 2 years ago

This issue can be deprioritized. I will put it back in the backlog.

alexec commented 2 years ago

ca87f2906 fix: Daemon step in running state, but dependents don't start (#7107)

alexec commented 2 years ago

Should be fixed on latest and will be on v3.2.7