argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.94k stars 5.46k forks source link

Pods stay in progressing because health check logic cannot be applied when the pod.Spec.RestartPolicy is: corev1.RestartPolicyOnFailure #7182

Open alexec opened 3 years ago

alexec commented 3 years ago

My pods stay in progressing, event thought they are ready/running:

image

Any chance you can take 2 mins to let me know why that might be? It is confusing for me.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    dataflow.argoproj.io/hash: bec2d474d0d6369e1e6e1041fa50e1cbd7f2cf10cb92f20bd00b5ca42ca74aeb
    dataflow.argoproj.io/kill-cmd.main: '["/var/run/argo-dataflow/kill","1"]'
    dataflow.argoproj.io/kill-cmd.sidecar: '["/var/run/argo-dataflow/kill","1"]'
    dataflow.argoproj.io/replica: '0'
    iam.amazonaws.com/role: 'arn:aws:iam::016253778016:role/k8s-dev-observe-argodataflowx-usw2-prd'
    kubectl.kubernetes.io/default-container: main
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: memory limit for container main'
    kubernetes.io/psp: eks.privileged
  creationTimestamp: '2021-08-26T00:26:25Z'
  labels:
    dataflow.argoproj.io/pipeline-name: odl-apigw-telegraf-druid
    dataflow.argoproj.io/step-name: agtd2
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:dataflow.argoproj.io/hash': {}
            'f:dataflow.argoproj.io/kill-cmd.main': {}
            'f:dataflow.argoproj.io/kill-cmd.sidecar': {}
            'f:dataflow.argoproj.io/replica': {}
            'f:iam.amazonaws.com/role': {}
            'f:kubectl.kubernetes.io/default-container': {}
          'f:labels':
            .: {}
            'f:dataflow.argoproj.io/pipeline-name': {}
            'f:dataflow.argoproj.io/step-name': {}
          'f:ownerReferences':
            .: {}
            'k:{"uid":"9a065307-e625-4d30-9407-973bcb46dd1a"}':
              .: {}
              'f:apiVersion': {}
              'f:blockOwnerDeletion': {}
              'f:controller': {}
              'f:kind': {}
              'f:name': {}
              'f:uid': {}
        'f:spec':
          'f:containers':
            'k:{"name":"main"}':
              .: {}
              'f:env':
                .: {}
                'k:{"name":"ENV"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"POLICY_ID"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ROLE"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"TOPIC_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
              'f:image': {}
              'f:imagePullPolicy': {}
              'f:lifecycle':
                .: {}
                'f:preStop':
                  .: {}
                  'f:exec':
                    .: {}
                    'f:command': {}
              'f:name': {}
              'f:resources': {}
              'f:securityContext':
                .: {}
                'f:allowPrivilegeEscalation': {}
                'f:capabilities':
                  .: {}
                  'f:drop': {}
              'f:terminationMessagePath': {}
              'f:terminationMessagePolicy': {}
              'f:volumeMounts':
                .: {}
                'k:{"mountPath":"/var/run/argo-dataflow"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
            'k:{"name":"sidecar"}':
              .: {}
              'f:args': {}
              'f:env':
                .: {}
                'k:{"name":"ARGO_DATAFLOW_CLUSTER_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_DEBUG"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_NAMESPACE"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_PIPELINE_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_REPLICA"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_STEP"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_UPDATE_INTERVAL"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"GODEBUG"}':
                  .: {}
                  'f:name': {}
              'f:image': {}
              'f:imagePullPolicy': {}
              'f:lifecycle':
                .: {}
                'f:preStop':
                  .: {}
                  'f:httpGet':
                    .: {}
                    'f:path': {}
                    'f:port': {}
                    'f:scheme': {}
              'f:name': {}
              'f:ports':
                .: {}
                'k:{"containerPort":3570,"protocol":"TCP"}':
                  .: {}
                  'f:containerPort': {}
                  'f:protocol': {}
              'f:readinessProbe':
                .: {}
                'f:failureThreshold': {}
                'f:httpGet':
                  .: {}
                  'f:path': {}
                  'f:port': {}
                  'f:scheme': {}
                'f:periodSeconds': {}
                'f:successThreshold': {}
                'f:timeoutSeconds': {}
              'f:resources':
                .: {}
                'f:limits':
                  .: {}
                  'f:cpu': {}
                  'f:memory': {}
                'f:requests':
                  .: {}
                  'f:cpu': {}
                  'f:memory': {}
              'f:securityContext':
                .: {}
                'f:allowPrivilegeEscalation': {}
                'f:capabilities':
                  .: {}
                  'f:drop': {}
              'f:terminationMessagePath': {}
              'f:terminationMessagePolicy': {}
              'f:volumeMounts':
                .: {}
                'k:{"mountPath":"/var/run/argo-dataflow"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
          'f:dnsPolicy': {}
          'f:enableServiceLinks': {}
          'f:initContainers':
            .: {}
            'k:{"name":"init"}':
              .: {}
              'f:args': {}
              'f:env':
                .: {}
                'k:{"name":"ARGO_DATAFLOW_CLUSTER_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_DEBUG"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_NAMESPACE"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_PIPELINE_NAME"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_REPLICA"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_STEP"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"ARGO_DATAFLOW_UPDATE_INTERVAL"}':
                  .: {}
                  'f:name': {}
                  'f:value': {}
                'k:{"name":"GODEBUG"}':
                  .: {}
                  'f:name': {}
              'f:image': {}
              'f:imagePullPolicy': {}
              'f:name': {}
              'f:resources':
                .: {}
                'f:limits':
                  .: {}
                  'f:cpu': {}
                  'f:memory': {}
                'f:requests':
                  .: {}
                  'f:cpu': {}
                  'f:memory': {}
              'f:securityContext':
                .: {}
                'f:allowPrivilegeEscalation': {}
                'f:capabilities':
                  .: {}
                  'f:drop': {}
              'f:terminationMessagePath': {}
              'f:terminationMessagePolicy': {}
              'f:volumeMounts':
                .: {}
                'k:{"mountPath":"/.ssh"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
                  'f:readOnly': {}
                'k:{"mountPath":"/var/run/argo-dataflow"}':
                  .: {}
                  'f:mountPath': {}
                  'f:name': {}
          'f:priorityClassName': {}
          'f:restartPolicy': {}
          'f:schedulerName': {}
          'f:securityContext':
            .: {}
            'f:runAsNonRoot': {}
            'f:runAsUser': {}
          'f:serviceAccount': {}
          'f:serviceAccountName': {}
          'f:terminationGracePeriodSeconds': {}
          'f:volumes':
            .: {}
            'k:{"name":"ssh"}':
              .: {}
              'f:name': {}
              'f:secret':
                .: {}
                'f:defaultMode': {}
                'f:secretName': {}
            'k:{"name":"var-run-argo-dataflow"}':
              .: {}
              'f:emptyDir': {}
              'f:name': {}
      manager: manager
      operation: Update
      time: '2021-08-26T00:26:24Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:conditions':
            'k:{"type":"ContainersReady"}':
              .: {}
              'f:lastProbeTime': {}
              'f:lastTransitionTime': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Initialized"}':
              .: {}
              'f:lastProbeTime': {}
              'f:lastTransitionTime': {}
              'f:status': {}
              'f:type': {}
            'k:{"type":"Ready"}':
              .: {}
              'f:lastProbeTime': {}
              'f:lastTransitionTime': {}
              'f:status': {}
              'f:type': {}
          'f:containerStatuses': {}
          'f:hostIP': {}
          'f:initContainerStatuses': {}
          'f:phase': {}
          'f:podIP': {}
          'f:podIPs':
            .: {}
            'k:{"ip":"10.246.90.150"}':
              .: {}
              'f:ip': {}
          'f:startTime': {}
      manager: kubelet
      operation: Update
      time: '2021-09-05T22:38:02Z'
  name: odl-apigw-telegraf-druid-agtd2-0
  namespace: dev-observe-argodataflowx-usw2-prd
  ownerReferences:
    - apiVersion: dataflow.argoproj.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: Step
      name: odl-apigw-telegraf-druid-agtd2
      uid: 9a065307-e625-4d30-9407-973bcb46dd1a
  resourceVersion: '355577890'
  selfLink: >-
    /api/v1/namespaces/dev-observe-argodataflowx-usw2-prd/pods/odl-apigw-telegraf-druid-agtd2-0
  uid: 33fc0710-2295-4913-8c56-19bc529ed322
spec:
  containers:
    - args:
        - sidecar
      env:
        - name: ARGO_DATAFLOW_CLUSTER_NAME
          value: tech-pi-prd-usw2-k8s
        - name: ARGO_DATAFLOW_DEBUG
          value: 'false'
        - name: ARGO_DATAFLOW_NAMESPACE
          value: dev-observe-argodataflowx-usw2-prd
        - name: ARGO_DATAFLOW_PIPELINE_NAME
          value: odl-apigw-telegraf-druid
        - name: ARGO_DATAFLOW_REPLICA
          value: '0'
        - name: ARGO_DATAFLOW_STEP
          value: >-
            {"kind":"Step","apiVersion":"dataflow.argoproj.io/v1alpha1","metadata":{"name":"odl-apigw-telegraf-druid-agtd2","namespace":"dev-observe-argodataflowx-usw2-prd","selfLink":"/apis/dataflow.argoproj.io/v1alpha1/namespaces/dev-observe-argodataflowx-usw2-prd/steps/odl-apigw-telegraf-druid-agtd2","uid":"9a065307-e625-4d30-9407-973bcb46dd1a","resourceVersion":"339044253","generation":4871,"creationTimestamp":"2021-07-30T23:06:35Z","labels":{"dataflow.argoproj.io/pipeline-name":"odl-apigw-telegraf-druid","dataflow.argoproj.io/step-name":"agtd2"},"ownerReferences":[{"apiVersion":"dataflow.argoproj.io/v1alpha1","kind":"Pipeline","name":"odl-apigw-telegraf-druid","uid":"372d3221-51a3-4aef-b1c8-6363a7b3e6d9","controller":true,"blockOwnerDeletion":true}]},"spec":{"name":"agtd2","container":{"image":"docker.intuit.com/dev/containers/argo/service/dataflow-agtd2:latest","env":[{"name":"ENV","value":"prd"},{"name":"POLICY_ID","value":"p-1jk10kpxwj35"},{"name":"ROLE","value":"arn:aws:iam::016253778016:role/k8s-dev-observe-argodataflowx-usw2-prd"},{"name":"TOPIC_NAME","value":"ip-apigw-telegraf-metrics-prd"}],"resources":{}},"replicas":2,"scale":{"desiredReplicas":"limit(pending
            / (10 * 60 * 2000), 0, 12,
            1)","peekDelay":"defaultPeekDelay","scalingDelay":"\"5m\""},"sources":[{"name":"default","kafka":{"name":"default","topic":"ip-apigw-telegraf-metrics-prd","startOffset":"Last"},"retry":{"duration":"100ms","factorPercentage":200,"steps":20,"cap":"0s","jitterPercentage":10}}],"sinks":[{"name":"default","kafka":{"name":"default","topic":"ip-apigw-argo-druid"}}],"restartPolicy":"OnFailure","serviceAccountName":"pipeline","metadata":{"annotations":{"iam.amazonaws.com/role":"arn:aws:iam::016253778016:role/k8s-dev-observe-argodataflowx-usw2-prd"}},"sidecar":{"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"500m","memory":"512Mi"}}}},"status":{"phase":"","replicas":2,"selector":"dataflow.argoproj.io/pipeline-name=odl-apigw-telegraf-druid,dataflow.argoproj.io/step-name=agtd2","lastScaledAt":"2021-08-26T00:17:34Z","sourceStatuses":{"default":{"pending":3444292,"lastPending":3406859,"metrics":{"0":{"total":2484338081,"rate":"2869.017","retries":1774,"totalBytes":2178751050625},"1":{"total":1989990388,"rate":"2852.167","retries":1367,"totalBytes":1896327608622},"10":{"total":1869139,"rate":"671.833"},"11":{"total":776682,"rate":"170.450"},"12":{"total":1529734,"rate":"112.267","retries":2},"13":{"total":270388,"rate":"282.350"},"14":{"total":182393,"rate":"117.400"},"15":{"total":250089,"rate":"285.767"},"16":{"total":1409150,"rate":"809.967","retries":68},"17":{"total":2174597,"rate":"568.367","retries":4},"18":{"total":83756,"rate":"210.833"},"19":{"total":1360461,"rate":"545.050","retries":12},"2":{"total":1236831409,"rate":"2081.950","retries":516,"totalBytes":1093024830336},"20":{"total":1984219,"rate":"13.767","retries":2},"21":{"total":31702,"rate":"513.900"},"22":{"total":43446,"rate":"236.750"},"23":{"total":1185523,"rate":"288.400","retries":2},"3":{"total":775193917,"rate":"1644.817","retries":445,"totalBytes":390582424922},"4":{"total":261686834,"rate":"1166.967","retries":133,"totalBytes":249736535817},"5":{"total":124125307,"rate":"1141.067","retries":94,"totalBytes":82045221961},"6":{"total":103083009,"rate":"1161.050","retries":88,"totalBytes":755086945},"7":{"total":82011031,"rate":"823.367","retries":91,"totalBytes":611106658},"8":{"total":31736164,"rate":"300.733","retries":133},"9":{"total":42002014,"rate":"811.567","retries":62}}}},"sinkStatuses":{"default":{"metrics":{"0":{"total":1885045814,"rate":"2147.683","totalBytes":927706439868},"1":{"total":1508394987,"rate":"2132.517","totalBytes":807652263645},"10":{"total":1375520,"rate":"513.817"},"11":{"total":566861,"rate":"111.917"},"12":{"total":1126184,"rate":"71.717"},"13":{"total":173353,"rate":"188.783"},"14":{"total":119975,"rate":"63.867"},"15":{"total":153384,"rate":"162.967"},"16":{"total":1027478,"rate":"430.350"},"17":{"total":1622706,"rate":"372.333"},"18":{"total":51809,"rate":"133.817"},"19":{"total":1012440,"rate":"350.817"},"2":{"total":937630035,"rate":"1550.100","totalBytes":465531302036},"20":{"total":1493531,"rate":"3.067"},"21":{"total":20009,"rate":"331.917"},"22":{"total":27187,"rate":"145"},"23":{"total":894741,"rate":"230.217"},"3":{"total":587804600,"rate":"1189.733","totalBytes":166072628815},"4":{"total":197885360,"rate":"874.883","totalBytes":106120642245},"5":{"total":93979216,"rate":"863.850","totalBytes":34860071551},"6":{"total":78262115,"rate":"854.533","totalBytes":320937458},"7":{"total":62277381,"rate":"628.433","totalBytes":259497086},"8":{"total":24155773,"rate":"190.200"},"9":{"total":31960961,"rate":"617.383"}}}}}}
        - name: ARGO_DATAFLOW_UPDATE_INTERVAL
          value: 1m0s
        - name: GODEBUG
      image: 'docker.intuit.com/quay-rmt/argoproj/dataflow-runner:v0.0.96'
      imagePullPolicy: IfNotPresent
      lifecycle:
        preStop:
          httpGet:
            path: /pre-stop?source=kubernetes
            port: 3570
            scheme: HTTPS
      name: sidecar
      ports:
        - containerPort: 3570
          protocol: TCP
      readinessProbe:
        failureThreshold: 3
        httpGet:
          path: /ready
          port: 3570
          scheme: HTTPS
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        limits:
          cpu: '1'
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 512Mi
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - all
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /var/run/argo-dataflow
          name: var-run-argo-dataflow
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: pipeline-token-q2t7v
          readOnly: true
    - env:
        - name: ENV
          value: prd
        - name: POLICY_ID
          value: p-1jk10kpxwj35
        - name: ROLE
          value: >-
            arn:aws:iam::016253778016:role/k8s-dev-observe-argodataflowx-usw2-prd
        - name: TOPIC_NAME
          value: ip-apigw-telegraf-metrics-prd
      image: 'docker.intuit.com/dev/containers/argo/service/dataflow-agtd2:latest'
      imagePullPolicy: Always
      lifecycle:
        preStop:
          exec:
            command:
              - /var/run/argo-dataflow/prestop
      name: main
      resources:
        limits:
          memory: 4Gi
        requests:
          cpu: 100m
          memory: 126Mi
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - all
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /var/run/argo-dataflow
          name: var-run-argo-dataflow
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: pipeline-token-q2t7v
          readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
    - args:
        - init
      env:
        - name: ARGO_DATAFLOW_CLUSTER_NAME
          value: tech-pi-prd-usw2-k8s
        - name: ARGO_DATAFLOW_DEBUG
          value: 'false'
        - name: ARGO_DATAFLOW_NAMESPACE
          value: dev-observe-argodataflowx-usw2-prd
        - name: ARGO_DATAFLOW_PIPELINE_NAME
          value: odl-apigw-telegraf-druid
        - name: ARGO_DATAFLOW_REPLICA
          value: '0'
        - name: ARGO_DATAFLOW_STEP
          value: >-
            {"kind":"Step","apiVersion":"dataflow.argoproj.io/v1alpha1","metadata":{"name":"odl-apigw-telegraf-druid-agtd2","namespace":"dev-observe-argodataflowx-usw2-prd","selfLink":"/apis/dataflow.argoproj.io/v1alpha1/namespaces/dev-observe-argodataflowx-usw2-prd/steps/odl-apigw-telegraf-druid-agtd2","uid":"9a065307-e625-4d30-9407-973bcb46dd1a","resourceVersion":"339044253","generation":4871,"creationTimestamp":"2021-07-30T23:06:35Z","labels":{"dataflow.argoproj.io/pipeline-name":"odl-apigw-telegraf-druid","dataflow.argoproj.io/step-name":"agtd2"},"ownerReferences":[{"apiVersion":"dataflow.argoproj.io/v1alpha1","kind":"Pipeline","name":"odl-apigw-telegraf-druid","uid":"372d3221-51a3-4aef-b1c8-6363a7b3e6d9","controller":true,"blockOwnerDeletion":true}]},"spec":{"name":"agtd2","container":{"image":"docker.intuit.com/dev/containers/argo/service/dataflow-agtd2:latest","env":[{"name":"ENV","value":"prd"},{"name":"POLICY_ID","value":"p-1jk10kpxwj35"},{"name":"ROLE","value":"arn:aws:iam::016253778016:role/k8s-dev-observe-argodataflowx-usw2-prd"},{"name":"TOPIC_NAME","value":"ip-apigw-telegraf-metrics-prd"}],"resources":{}},"replicas":2,"scale":{"desiredReplicas":"limit(pending
            / (10 * 60 * 2000), 0, 12,
            1)","peekDelay":"defaultPeekDelay","scalingDelay":"\"5m\""},"sources":[{"name":"default","kafka":{"name":"default","topic":"ip-apigw-telegraf-metrics-prd","startOffset":"Last"},"retry":{"duration":"100ms","factorPercentage":200,"steps":20,"cap":"0s","jitterPercentage":10}}],"sinks":[{"name":"default","kafka":{"name":"default","topic":"ip-apigw-argo-druid"}}],"restartPolicy":"OnFailure","serviceAccountName":"pipeline","metadata":{"annotations":{"iam.amazonaws.com/role":"arn:aws:iam::016253778016:role/k8s-dev-observe-argodataflowx-usw2-prd"}},"sidecar":{"resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"500m","memory":"512Mi"}}}},"status":{"phase":"","replicas":2,"selector":"dataflow.argoproj.io/pipeline-name=odl-apigw-telegraf-druid,dataflow.argoproj.io/step-name=agtd2","lastScaledAt":"2021-08-26T00:17:34Z","sourceStatuses":{"default":{"pending":3444292,"lastPending":3406859,"metrics":{"0":{"total":2484338081,"rate":"2869.017","retries":1774,"totalBytes":2178751050625},"1":{"total":1989990388,"rate":"2852.167","retries":1367,"totalBytes":1896327608622},"10":{"total":1869139,"rate":"671.833"},"11":{"total":776682,"rate":"170.450"},"12":{"total":1529734,"rate":"112.267","retries":2},"13":{"total":270388,"rate":"282.350"},"14":{"total":182393,"rate":"117.400"},"15":{"total":250089,"rate":"285.767"},"16":{"total":1409150,"rate":"809.967","retries":68},"17":{"total":2174597,"rate":"568.367","retries":4},"18":{"total":83756,"rate":"210.833"},"19":{"total":1360461,"rate":"545.050","retries":12},"2":{"total":1236831409,"rate":"2081.950","retries":516,"totalBytes":1093024830336},"20":{"total":1984219,"rate":"13.767","retries":2},"21":{"total":31702,"rate":"513.900"},"22":{"total":43446,"rate":"236.750"},"23":{"total":1185523,"rate":"288.400","retries":2},"3":{"total":775193917,"rate":"1644.817","retries":445,"totalBytes":390582424922},"4":{"total":261686834,"rate":"1166.967","retries":133,"totalBytes":249736535817},"5":{"total":124125307,"rate":"1141.067","retries":94,"totalBytes":82045221961},"6":{"total":103083009,"rate":"1161.050","retries":88,"totalBytes":755086945},"7":{"total":82011031,"rate":"823.367","retries":91,"totalBytes":611106658},"8":{"total":31736164,"rate":"300.733","retries":133},"9":{"total":42002014,"rate":"811.567","retries":62}}}},"sinkStatuses":{"default":{"metrics":{"0":{"total":1885045814,"rate":"2147.683","totalBytes":927706439868},"1":{"total":1508394987,"rate":"2132.517","totalBytes":807652263645},"10":{"total":1375520,"rate":"513.817"},"11":{"total":566861,"rate":"111.917"},"12":{"total":1126184,"rate":"71.717"},"13":{"total":173353,"rate":"188.783"},"14":{"total":119975,"rate":"63.867"},"15":{"total":153384,"rate":"162.967"},"16":{"total":1027478,"rate":"430.350"},"17":{"total":1622706,"rate":"372.333"},"18":{"total":51809,"rate":"133.817"},"19":{"total":1012440,"rate":"350.817"},"2":{"total":937630035,"rate":"1550.100","totalBytes":465531302036},"20":{"total":1493531,"rate":"3.067"},"21":{"total":20009,"rate":"331.917"},"22":{"total":27187,"rate":"145"},"23":{"total":894741,"rate":"230.217"},"3":{"total":587804600,"rate":"1189.733","totalBytes":166072628815},"4":{"total":197885360,"rate":"874.883","totalBytes":106120642245},"5":{"total":93979216,"rate":"863.850","totalBytes":34860071551},"6":{"total":78262115,"rate":"854.533","totalBytes":320937458},"7":{"total":62277381,"rate":"628.433","totalBytes":259497086},"8":{"total":24155773,"rate":"190.200"},"9":{"total":31960961,"rate":"617.383"}}}}}}
        - name: ARGO_DATAFLOW_UPDATE_INTERVAL
          value: 1m0s
        - name: GODEBUG
      image: 'docker.intuit.com/quay-rmt/argoproj/dataflow-runner:v0.0.96'
      imagePullPolicy: IfNotPresent
      name: init
      resources:
        limits:
          cpu: 500m
          memory: 256Mi
        requests:
          cpu: 100m
          memory: 64Mi
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop:
            - all
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /var/run/argo-dataflow
          name: var-run-argo-dataflow
        - mountPath: /.ssh
          name: ssh
          readOnly: true
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: pipeline-token-q2t7v
          readOnly: true
  nodeName: ip-10-246-115-12.us-west-2.compute.internal
  nodeSelector:
    node.kubernetes.io/instancegroup: nodes
  preemptionPolicy: PreemptLowerPriority
  priority: 1
  priorityClassName: lead-replica
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
    runAsUser: 9653
  serviceAccount: pipeline
  serviceAccountName: pipeline
  terminationGracePeriodSeconds: 30
  tolerations:
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    - key: ig/nodes
  volumes:
    - emptyDir: {}
      name: var-run-argo-dataflow
    - name: ssh
      secret:
        defaultMode: 420
        secretName: ssh
    - name: pipeline-token-q2t7v
      secret:
        defaultMode: 420
        secretName: pipeline-token-q2t7v
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2021-08-26T00:26:26Z'
      status: 'True'
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: '2021-09-05T22:38:02Z'
      status: 'True'
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: '2021-09-05T22:38:02Z'
      status: 'True'
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: '2021-08-26T00:26:25Z'
      status: 'True'
      type: PodScheduled
  containerStatuses:
    - containerID: >-
        docker://ac9169e12acc1787ff22c8395af0bfebc96725d9451b4e569a2d8298cf73da14
      image: 'docker.intuit.com/dev/containers/argo/service/dataflow-agtd2:latest'
      imageID: >-
        docker-pullable://docker.intuit.com/dev/containers/argo/service/dataflow-agtd2@sha256:646aa59d4d135d162f3f9f04b3297376308eaa9b31d95ed45245bf86710342a3
      lastState:
        terminated:
          containerID: >-
            docker://d35b848ef1f1b1f0014bfeb658cdc4faf0aa246f4bebeb96efe2d9200b7b22fa
          exitCode: 2
          finishedAt: '2021-09-05T22:38:00Z'
          reason: Error
          startedAt: '2021-09-05T16:57:49Z'
      name: main
      ready: true
      restartCount: 7
      started: true
      state:
        running:
          startedAt: '2021-09-05T22:38:02Z'
    - containerID: >-
        docker://03ec0cdf41d646901ddc4bf7c700c1022309eb8aa24fb0d74cb7c091d0942d2f
      image: 'docker.intuit.com/quay-rmt/argoproj/dataflow-runner:v0.0.96'
      imageID: >-
        docker-pullable://docker.intuit.com/quay-rmt/argoproj/dataflow-runner@sha256:6b4efc63bb8038d373c92fe3c44886105ab49a0baac6f354a2b588e0ad44bae4
      lastState: {}
      name: sidecar
      ready: true
      restartCount: 0
      started: true
      state:
        running:
          startedAt: '2021-08-26T00:26:26Z'
  hostIP: 10.246.115.12
  initContainerStatuses:
    - containerID: >-
        docker://68de6d33860c20e5a35d29edc10c2843b7e9362bedc1daacd099c0ac5b108b5f
      image: 'docker.intuit.com/quay-rmt/argoproj/dataflow-runner:v0.0.96'
      imageID: >-
        docker-pullable://docker.intuit.com/quay-rmt/argoproj/dataflow-runner@sha256:6b4efc63bb8038d373c92fe3c44886105ab49a0baac6f354a2b588e0ad44bae4
      lastState: {}
      name: init
      ready: true
      restartCount: 0
      state:
        terminated:
          containerID: >-
            docker://68de6d33860c20e5a35d29edc10c2843b7e9362bedc1daacd099c0ac5b108b5f
          exitCode: 0
          finishedAt: '2021-08-26T00:26:25Z'
          reason: Completed
          startedAt: '2021-08-26T00:26:25Z'
  phase: Running
  podIP: 10.246.90.150
  podIPs:
    - ip: 10.246.90.150
  qosClass: Burstable
  startTime: '2021-08-26T00:26:25Z'
alexec commented 3 years ago

Here is the cause:

func getCorev1PodHealth(pod *corev1.Pod) (*HealthStatus, error) {
    // This logic cannot be applied when the pod.Spec.RestartPolicy is: corev1.RestartPolicyOnFailure,
    // corev1.RestartPolicyNever, otherwise it breaks the resource hook logic.
    // The issue is, if we mark a pod with ImagePullBackOff as Degraded, and the pod is used as a resource hook,
    // then we will prematurely fail the PreSync/PostSync hook. Meanwhile, when that error condition is resolved
    // (e.g. the image is available), the resource hook pod will unexpectedly be executed even though the sync has
    // completed.
wd commented 2 years ago

here is the code https://github.com/argoproj/gitops-engine/blob/master/pkg/health/health_pod.go#L119 they hardcoded the return value.

Liammarwood commented 3 months ago

Potentially a fix would be to allow adding an annotation to override the default behaviour, that way it doesn't break the intended functionality for jobs and hooks? Any thoughts?