PodGC OnPodCompletion strategy does not work in 3.1-rc10

amitm02 commented 3 years ago

Summary

While running the workflow below, the completed pods sequence-kldp5-3397304079 and sequence-kldp5-808541759 are not removed by OnPodCompletion strategy.

Runnign on V3.1-rc10, Emissary, GKE

> k get pods
NAME                                   READY   STATUS      RESTARTS   AGE
argo-server-6767f6f858-fmnvp           1/1     Running     3          29h
minio-9464498c9-gtgkt                  1/1     Running     0          7d4h
postgres-bb76cd4b7-5jpkg               1/1     Running     0          3d17h
sequence-kldp5-3397304079              0/2     Completed   0          7m49s
sequence-kldp5-808541759               0/4     Completed   0          7m49s
workflow-controller-6b698f496d-rhxdh   1/1     Running     8          29h

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: sequence-
  labels:
    workflows.argoproj.io/container-runtime-executor: emissary
spec:
  entrypoint: entry
  podGC:
    strategy: OnPodCompletion
    # strategy: OnPodSuccess
  templates:
    - name: entry
      steps:
        - - name: container-set
            template: container-set
          - name: regular-container
            template: regular-container
        - - name: approve
            template: approve

    - name: container-set
      containerSet:
        containers:
          - name: a
            image: argoproj/argosay:v2
          - name: b
            image: argoproj/argosay:v2
            dependencies:
              - a
          - name: c
            image: argoproj/argosay:v2
            dependencies:
              - b

    - name: regular-container
      container:
        image: docker/whalesay
        command: [cowsay]

    - name: approve
      suspend: {}

wf.yaml.txt

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

alexec commented 3 years ago

Can you please confirm this was working in v3.0?

sarabala1979 commented 3 years ago

@amitm02 can you share the controller logs?

amitm02 commented 3 years ago

time="2021-05-31T14:54:09.809Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:09.813Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:14.816Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:14.820Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:14.874Z" level=info msg="Watch clusterworkflowtemplates 200"
time="2021-05-31T14:54:16.785Z" level=info msg="cleaning up pod" action=deletePod key=argo/do-pipline-6srw4-1016923602/deletePod
time="2021-05-31T14:54:16.788Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:19.823Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:19.827Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:22.085Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/sequence-cspz4/labelPodCompleted
time="2021-05-31T14:54:22.089Z" level=info msg="Patch pods 404"
time="2021-05-31T14:54:22.090Z" level=warning msg="failed to clean-up pod" action=labelPodCompleted error="pods \"sequence-cspz4\" not found" key=argo/sequence-cspz4/labelPodCompleted
time="2021-05-31T14:54:22.107Z" level=warning msg="Non-transient error: pods \"sequence-cspz4\" not found"
time="2021-05-31T14:54:22.185Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/sequence-lv7cx/labelPodCompleted
time="2021-05-31T14:54:22.189Z" level=info msg="Patch pods 404"
time="2021-05-31T14:54:22.189Z" level=warning msg="failed to clean-up pod" action=labelPodCompleted error="pods \"sequence-lv7cx\" not found" key=argo/sequence-lv7cx/labelPodCompleted
time="2021-05-31T14:54:22.189Z" level=warning msg="Non-transient error: pods \"sequence-lv7cx\" not found"
time="2021-05-31T14:54:22.285Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/sequence-pvldp/labelPodCompleted
time="2021-05-31T14:54:22.288Z" level=info msg="Patch pods 404"
time="2021-05-31T14:54:22.289Z" level=warning msg="failed to clean-up pod" action=labelPodCompleted error="pods \"sequence-pvldp\" not found" key=argo/sequence-pvldp/labelPodCompleted
time="2021-05-31T14:54:22.289Z" level=warning msg="Non-transient error: pods \"sequence-pvldp\" not found"
time="2021-05-31T14:54:22.385Z" level=info msg="cleaning up pod" action=labelPodCompleted key=argo/sequence-gkjmm-3818374930/labelPodCompleted
time="2021-05-31T14:54:22.388Z" level=info msg="Patch pods 404"
time="2021-05-31T14:54:22.389Z" level=warning msg="failed to clean-up pod" action=labelPodCompleted error="pods \"sequence-gkjmm-3818374930\" not found" key=argo/sequence-gkjmm-3818374930/labelPodCompleted
time="2021-05-31T14:54:22.389Z" level=warning msg="Non-transient error: pods \"sequence-gkjmm-3818374930\" not found"
time="2021-05-31T14:54:22.485Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-6pjk8-2430312336/deletePod
time="2021-05-31T14:54:22.489Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:22.585Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-6pjk8-2616966664/deletePod
time="2021-05-31T14:54:22.589Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:22.785Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-x2pkj-543642184/deletePod
time="2021-05-31T14:54:22.788Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:22.885Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-x2pkj-2272827344/deletePod
time="2021-05-31T14:54:22.888Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:22.985Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-kldp5-808541759/deletePod
time="2021-05-31T14:54:22.988Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:23.085Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-kldp5-3397304079/deletePod
time="2021-05-31T14:54:23.088Z" level=info msg="Delete pods 404"
time="2021-05-31T14:54:24.830Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:24.834Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:29.837Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:29.841Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:34.845Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:34.848Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:39.852Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:39.856Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:44.860Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:44.863Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:49.867Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:49.871Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:54.874Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:54.878Z" level=info msg="Update leases 200"
time="2021-05-31T14:54:59.881Z" level=info msg="Get leases 200"
time="2021-05-31T14:54:59.885Z" level=info msg="Update leases 200"
time="2021-05-31T14:55:04.888Z" level=info msg="Get leases 200"

amitm02 commented 3 years ago

@alexec , i can not confirm this was working in v3.0. this bug comes and goes. e.g I do have a problem re-produce the bug this morning.

maybe issue related to:

time="2021-05-31T14:54:22.389Z" level=warning msg="failed to clean-up pod" action=labelPodCompleted error="pods \"sequence-gkjmm-3818374930\" not found" key=argo/sequence-gkjmm-3818374930/labelPodCompleted
time="2021-05-31T14:54:22.389Z" level=warning msg="Non-transient error: pods \"sequence-gkjmm-3818374930\" not found"

amitm02 commented 3 years ago

not sure if related, but i also get this kind of error in the controller

time="2021-06-01T08:19:37.608Z" level=warning msg="failed to clean-up pod" action=terminateContainers error="Internal error occurred: error executing command in container: failed to exec in container: failed to create exec \"3772223fc747a580005ba18f08b3ec142f057678d9e8ab221c215972c353bb1b\": cannot exec in a stopped state: unknown" key=argo/do-pipline-b6mz7-2774831253/terminateContainers
time="2021-06-01T08:19:37.609Z" level=warning msg="Non-transient error: Internal error occurred: error executing command in container: failed to exec in container: failed to create exec \"3772223fc747a580005ba18f08b3ec142f057678d9e8ab221c215972c353bb1b\": cannot exec in a stopped state: unknown"

sarabala1979 commented 3 years ago

time="2021-05-31T14:54:22.985Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-kldp5-808541759/deletePod time="2021-05-31T14:54:22.988Z" level=info msg="Delete pods 404" time="2021-05-31T14:54:23.085Z" level=info msg="cleaning up pod" action=deletePod key=argo/sequence-kldp5-3397304079/deletePod time="2021-05-31T14:54:23.088Z" level=info msg="Delete pods 404"

The controller is trying to delete the pods which is not found. looks like it has been deleted already.

I tried the above example in my local and not able to reproduce it.

Can you provide more information? like k8s API server logs, enable debug level on controller

alexec commented 3 years ago

Could it be deleting from the wrong namespace, or with the wrong name? Inspect the code?

amitm02 commented 3 years ago

@sarabala1979 , Like all annoying bugs, this one is not easily reproducible. It comes and goes. we will try to hunt down the next occurrence and bring more logs.

DO-YardenG commented 3 years ago

Same issue, but without the "failed to clean up pod" errors, or any other logs indicating an attempt to delete the pods. (even using --loglevel debug ) containers inside pods are terminated with "reason:completed", but the pods themselves remain in completed state for a long time after. What we testing with was a very simple 50 pods sequence, where each pods executes an echo command. (times 10-100 workflows per run) Version used: 3.1.0-rc12

Additional logs:

get pods

``` loops-sequence-csj42-2500471231 0/2 Completed 0 5m34s loops-sequence-d6wqh-2084448535 0/2 Completed 0 92s loops-sequence-d6wqh-2143951127 0/2 Completed 0 82s loops-sequence-d6wqh-462093237 0/2 Completed 0 98s loops-sequence-drt9p-2241511250 0/2 Completed 0 112s loops-sequence-drt9p-2903342814 0/2 Completed 0 119s loops-sequence-hcvp6-3787719320 0/2 Completed 0 69s loops-sequence-hcvp6-731461624 0/2 Completed 0 87s loops-sequence-hzrbv-2210776667 0/2 Completed 0 92s loops-sequence-jtfqh-965528520 0/2 Completed 0 31m loops-sequence-k5665-1191934782 0/2 Completed 0 8m49s loops-sequence-xdkjk-3246576675 0/2 Completed 0 7m38s ```

pod describe

``` $ k describe po loops-sequence-xdkjk-3246576675 Name: loops-sequence-xdkjk-3246576675 Namespace: argo Priority: 100 Priority Class Name: argo-default Node: gke-argo-dev-argo-workers-preemptible-******-sqf3/****** Start Time: Sun, 06 Jun 2021 15:30:52 +0300 Labels: workflows.argoproj.io/completed=false workflows.argoproj.io/workflow=loops-sequence-xdkjk Annotations: workflows.argoproj.io/node-name: loops-sequence-xdkjk[0].sequence-count(8:8) workflows.argoproj.io/outputs: {"artifacts":[{"name":"main-logs","s3":{"key":"loops-sequence-xdkjk/loops-sequence-xdkjk-3246576675/main.log"}}]} workflows.argoproj.io/template: {"name":"echo","inputs":{"parameters":[{"name":"msg","value":"8"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"alpine:late... Status: Succeeded IP: ****** IPs: IP: ****** Controlled By: Workflow/loops-sequence-xdkjk Init Containers: init: Container ID: containerd://844063cba488d74051ac54b7aa9f04682744418d1c0d529b6fdc4f072b4c7ec2 Image: argoproj/argoexec:v3.1.0-rc12 Image ID: docker.io/argoproj/argoexec@sha256:8da81e961aed9a44b210c41a3d2d9424f45c04cb64d305697e0d96d8cd8f54dc Port: Host Port: Command: argoexec init --loglevel info State: Terminated Reason: Completed Exit Code: 0 Started: Sun, 06 Jun 2021 15:30:57 +0300 Finished: Sun, 06 Jun 2021 15:30:58 +0300 Ready: True Restart Count: 0 Limits: cpu: 500m memory: 512Mi Requests: cpu: 500m memory: 512Mi Environment: ARGO_POD_NAME: loops-sequence-xdkjk-3246576675 (v1:metadata.name) ARGO_CONTAINER_RUNTIME_EXECUTOR: emissary GODEBUG: x509ignoreCN=0 ARGO_CONTAINER_NAME: init ARGO_INCLUDE_SCRIPT_OUTPUT: false Mounts: /argo/podmetadata from podmetadata (rw) /argo/secret/my-minio-cred from my-minio-cred (ro) /var/run/argo from var-run-argo (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-whv4c (ro) Containers: wait: Container ID: containerd://ed2764507b3d7885c5699e60ebe1b83663df6b47a3aac4983945f596a97ba9c5 Image: argoproj/argoexec:v3.1.0-rc12 Image ID: docker.io/argoproj/argoexec@sha256:8da81e961aed9a44b210c41a3d2d9424f45c04cb64d305697e0d96d8cd8f54dc Port: Host Port: Command: argoexec wait --loglevel info State: Terminated Reason: Completed Exit Code: 0 Started: Sun, 06 Jun 2021 15:31:12 +0300 Finished: Sun, 06 Jun 2021 15:31:40 +0300 Ready: False Restart Count: 0 Limits: cpu: 500m memory: 512Mi Requests: cpu: 500m memory: 512Mi Environment: ARGO_POD_NAME: loops-sequence-xdkjk-3246576675 (v1:metadata.name) ARGO_CONTAINER_RUNTIME_EXECUTOR: emissary GODEBUG: x509ignoreCN=0 ARGO_CONTAINER_NAME: wait ARGO_INCLUDE_SCRIPT_OUTPUT: false Mounts: /argo/podmetadata from podmetadata (rw) /argo/secret/my-minio-cred from my-minio-cred (ro) /var/run/argo from var-run-argo (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-whv4c (ro) main: Container ID: containerd://24638a9b457a47c5c444cb627ad572464a2236706429a3038292b40e2b3cd9e8 Image: alpine:latest Image ID: docker.io/library/alpine@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f Port: Host Port: Command: /var/run/argo/argoexec emissary -- echo 8 State: Terminated Reason: Completed Exit Code: 0 Started: Sun, 06 Jun 2021 15:31:40 +0300 Finished: Sun, 06 Jun 2021 15:31:40 +0300 Ready: False Restart Count: 0 Limits: cpu: 500m memory: 200Mi Requests: cpu: 500m memory: 200Mi Environment: ARGO_CONTAINER_NAME: main ARGO_INCLUDE_SCRIPT_OUTPUT: false Mounts: /var/run/argo from var-run-argo (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-whv4c (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: podmetadata: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.annotations -> annotations var-run-argo: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: my-minio-cred: Type: Secret (a volume populated by a Secret) SecretName: my-minio-cred Optional: false default-token-whv4c: Type: Secret (a volume populated by a Secret) SecretName: default-token-whv4c Optional: false QoS Class: Guaranteed Node-Selectors: Tolerations: argo-worker=true:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NotTriggerScaleUp cluster-autoscaler pod didn't trigger scale-up: 1 node(s) didn't match node selector, 2 max node group size reached Warning FailedScheduling (x25 over ) default-scheduler 0/17 nodes are available: 1 Insufficient memory, 12 Insufficient cpu, 4 node(s) didn't match node selector. Normal Pulled kubelet Container image "argoproj/argoexec:v3.1.0-rc12" already present on machine Normal Created kubelet Created container init Normal Started kubelet Started container init Normal Pulled kubelet Container image "argoproj/argoexec:v3.1.0-rc12" already present on machine Normal Created kubelet Created container wait Normal Started kubelet Started container wait Normal Pulling kubelet Pulling image "alpine:latest" Normal Pulled kubelet Successfully pulled image "alpine:latest" in 20.380123992s Normal Created kubelet Created container main Normal Started kubelet Started container main``` ```

specific stuck pod's workflow-controller logs

``` time="2021-06-06T12:32:07.628Z" level=info msg="Created pod: loops-sequence-xk66w[0].sequence-count(12:12) (loops-sequence-xk66w-2121339593)" namespace=argo workflow=loops-sequence-xk66w time="2021-06-06T12:33:29.414Z" level=info msg="Updating node loops-sequence-xk66w-2121339593 exit code 0" namespace=argo workflow=loops-sequence-xk66w time="2021-06-06T12:33:29.414Z" level=info msg="Setting node loops-sequence-xk66w-2121339593 outputs: {\"artifacts\":[{\"name\":\"main-logs\",\"s3\":{\"key\":\"loops-sequence-xk66w/loops-sequence-xk66w-2121339593/main.log\"}}]}" namespa time="2021-06-06T12:33:29.414Z" level=info msg="Updating node loops-sequence-xk66w-2121339593 status Pending -> Succeeded" namespace=argo workflow=loops-sequence-xk66w time="2021-06-06T12:34:03.223Z" level=info msg="cleaning up pod" action=deletePod key=argo/loops-sequence-xk66w-2121339593/deletePod ```

sarabala1979 commented 3 years ago

@DO-YardenG Can you add pod yaml kubectl get pods <> -yo yaml? Is it possible to get k8s API server log to see the controller making delete call?

DO-YardenG commented 3 years ago

Sure, it's not for the same pod, but a pod with the same behavior (as the pods do get deleted at some point, just has a varying delay)

Pod yaml (after completed)

``` apiVersion: v1 kind: Pod metadata: annotations: workflows.argoproj.io/node-name: loops-sequence-xldf9[0].sequence-count(30:30) workflows.argoproj.io/outputs: '{"artifacts":[{"name":"main-logs","s3":{"key":"loops-sequence-xldf9/loops-sequence-xldf9-3697300632/main.log"}}]}' workflows.argoproj.io/template: '{"name":"echo","inputs":{"parameters":[{"name":"msg","value":"30"}]},"outputs":{},"metadata":{},"container":{"name":"","image":"alpine:latest","command":["echo","30"],"resources":{"limits":{"cpu":"500m","memory":"200Mi"}}},"archiveLocation":{"archiveLogs":true,"s3":{"endpoint":"minio:9000","bucket":"my-bucket","insecure":true,"accessKeySecret":{"name":"my-minio-cred","key":"accesskey"},"secretKeySecret":{"name":"my-minio-cred","key":"secretkey"},"key":"loops-sequence-xldf9/loops-sequence-xldf9-3697300632"}}}' creationTimestamp: "2021-06-07T05:43:53Z" labels: workflows.argoproj.io/completed: "false" workflows.argoproj.io/workflow: loops-sequence-xldf9 managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:workflows.argoproj.io/node-name: {} f:workflows.argoproj.io/template: {} f:labels: .: {} f:workflows.argoproj.io/completed: {} f:workflows.argoproj.io/workflow: {} f:ownerReferences: .: {} k:{"uid":"488149ff-3a3e-4c47-8176-52d8339ac243"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:affinity: .: {} f:nodeAffinity: .: {} f:requiredDuringSchedulingIgnoredDuringExecution: .: {} f:nodeSelectorTerms: {} f:containers: k:{"name":"main"}: .: {} f:command: {} f:env: .: {} k:{"name":"ARGO_CONTAINER_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_INCLUDE_SCRIPT_OUTPUT"}: .: {} f:name: {} f:value: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:resources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/var/run/argo"}: .: {} f:mountPath: {} f:name: {} k:{"name":"wait"}: .: {} f:command: {} f:env: .: {} k:{"name":"ARGO_CONTAINER_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_CONTAINER_RUNTIME_EXECUTOR"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_INCLUDE_SCRIPT_OUTPUT"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_POD_NAME"}: .: {} f:name: {} f:valueFrom: .: {} f:fieldRef: .: {} f:apiVersion: {} f:fieldPath: {} k:{"name":"GODEBUG"}: .: {} f:name: {} f:value: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:resources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/argo/podmetadata"}: .: {} f:mountPath: {} f:name: {} k:{"mountPath":"/argo/secret/my-minio-cred"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/var/run/argo"}: .: {} f:mountPath: {} f:name: {} f:dnsPolicy: {} f:enableServiceLinks: {} f:initContainers: .: {} k:{"name":"init"}: .: {} f:command: {} f:env: .: {} k:{"name":"ARGO_CONTAINER_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_CONTAINER_RUNTIME_EXECUTOR"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_INCLUDE_SCRIPT_OUTPUT"}: .: {} f:name: {} f:value: {} k:{"name":"ARGO_POD_NAME"}: .: {} f:name: {} f:valueFrom: .: {} f:fieldRef: .: {} f:apiVersion: {} f:fieldPath: {} k:{"name":"GODEBUG"}: .: {} f:name: {} f:value: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:resources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/argo/podmetadata"}: .: {} f:mountPath: {} f:name: {} k:{"mountPath":"/argo/secret/my-minio-cred"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/var/run/argo"}: .: {} f:mountPath: {} f:name: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: {} f:terminationGracePeriodSeconds: {} f:tolerations: {} f:volumes: .: {} k:{"name":"my-minio-cred"}: .: {} f:name: {} f:secret: .: {} f:defaultMode: {} f:items: {} f:secretName: {} k:{"name":"podmetadata"}: .: {} f:downwardAPI: .: {} f:defaultMode: {} f:items: {} f:name: {} k:{"name":"var-run-argo"}: .: {} f:emptyDir: {} f:name: {} manager: workflow-controller operation: Update time: "2021-06-07T05:43:53Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: .: {} k:{"type":"PodScheduled"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} manager: kube-scheduler operation: Update time: "2021-06-07T05:44:01Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:workflows.argoproj.io/outputs: {} manager: argoexec operation: Update time: "2021-06-07T05:49:53Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: k:{"type":"ContainersReady"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Initialized"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Ready"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:reason: {} f:status: {} f:type: {} f:containerStatuses: {} f:hostIP: {} f:initContainerStatuses: {} f:phase: {} f:podIP: {} f:podIPs: .: {} k:{"ip":"10.96.10.150"}: .: {} f:ip: {} f:startTime: {} manager: kubelet operation: Update time: "2021-06-07T05:50:00Z" name: loops-sequence-xldf9-3697300632 namespace: argo ownerReferences: - apiVersion: argoproj.io/v1alpha1 blockOwnerDeletion: true controller: true kind: Workflow name: loops-sequence-xldf9 uid: 488149ff-3a3e-4c47-8176-52d8339ac243 resourceVersion: "42987294" selfLink: /api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632 uid: 601e2909-af1d-4471-89fb-b258a1dfca21 spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: argo-worker operator: In values: - "true" containers: - command: - argoexec - wait - --loglevel - debug env: - name: ARGO_POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: ARGO_CONTAINER_RUNTIME_EXECUTOR value: emissary - name: GODEBUG value: x509ignoreCN=0 - name: ARGO_CONTAINER_NAME value: wait - name: ARGO_INCLUDE_SCRIPT_OUTPUT value: "false" image: argoproj/argoexec:v3.1.0-rc12 imagePullPolicy: IfNotPresent name: wait resources: limits: cpu: 500m memory: 512Mi requests: cpu: 500m memory: 512Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /argo/podmetadata name: podmetadata - mountPath: /argo/secret/my-minio-cred name: my-minio-cred readOnly: true - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-whv4c readOnly: true - command: - /var/run/argo/argoexec - emissary - -- - echo - "30" env: - name: ARGO_CONTAINER_NAME value: main - name: ARGO_INCLUDE_SCRIPT_OUTPUT value: "false" image: alpine:latest imagePullPolicy: Always name: main resources: limits: cpu: 500m memory: 200Mi requests: cpu: 500m memory: 200Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-whv4c readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true initContainers: - command: - argoexec - init - --loglevel - debug env: - name: ARGO_POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: ARGO_CONTAINER_RUNTIME_EXECUTOR value: emissary - name: GODEBUG value: x509ignoreCN=0 - name: ARGO_CONTAINER_NAME value: init - name: ARGO_INCLUDE_SCRIPT_OUTPUT value: "false" image: argoproj/argoexec:v3.1.0-rc12 imagePullPolicy: IfNotPresent name: init resources: limits: cpu: 500m memory: 512Mi requests: cpu: 500m memory: 512Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /argo/podmetadata name: podmetadata - mountPath: /argo/secret/my-minio-cred name: my-minio-cred readOnly: true - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-whv4c readOnly: true nodeName: gke-argo-dev-argo-workers-preemptible-37cc9923-xbg4 preemptionPolicy: PreemptLowerPriority priority: 100 priorityClassName: argo-default restartPolicy: Never schedulerName: default-scheduler securityContext: {} serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: argo-worker operator: Equal value: "true" - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - downwardAPI: defaultMode: 420 items: - fieldRef: apiVersion: v1 fieldPath: metadata.annotations path: annotations name: podmetadata - emptyDir: {} name: var-run-argo - name: my-minio-cred secret: defaultMode: 420 items: - key: accesskey path: accesskey - key: secretkey path: secretkey secretName: my-minio-cred - name: default-token-whv4c secret: defaultMode: 420 secretName: default-token-whv4c status: conditions: - lastProbeTime: null lastTransitionTime: "2021-06-07T05:49:13Z" reason: PodCompleted status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2021-06-07T05:49:59Z" reason: PodCompleted status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2021-06-07T05:49:59Z" reason: PodCompleted status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2021-06-07T05:48:55Z" status: "True" type: PodScheduled containerStatuses: - containerID: containerd://bd0d470ec2b6603db77a7a968e69c219e8668fd4df8253ade7722682146a01d7 image: docker.io/library/alpine:latest imageID: docker.io/library/alpine@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e13cf8f lastState: {} name: main ready: false restartCount: 0 started: false state: terminated: containerID: containerd://bd0d470ec2b6603db77a7a968e69c219e8668fd4df8253ade7722682146a01d7 exitCode: 0 finishedAt: "2021-06-07T05:49:53Z" reason: Completed startedAt: "2021-06-07T05:49:53Z" - containerID: containerd://75d337d18022096be2c3495926c07915fc24cb1d691c3884bad48ac8ce45f282 image: docker.io/argoproj/argoexec:v3.1.0-rc12 imageID: docker.io/argoproj/argoexec@sha256:8da81e961aed9a44b210c41a3d2d9424f45c04cb64d305697e0d96d8cd8f54dc lastState: {} name: wait ready: false restartCount: 0 started: false state: terminated: containerID: containerd://75d337d18022096be2c3495926c07915fc24cb1d691c3884bad48ac8ce45f282 exitCode: 0 finishedAt: "2021-06-07T05:49:53Z" reason: Completed startedAt: "2021-06-07T05:49:16Z" hostIP: xx.xxx.xx.xxx initContainerStatuses: - containerID: containerd://d9e07f6211ac8ca6e8d2be3246a4ad1f11bb0ff4ee409267b1478d4095a3d76e image: docker.io/argoproj/argoexec:v3.1.0-rc12 imageID: docker.io/argoproj/argoexec@sha256:8da81e961aed9a44b210c41a3d2d9424f45c04cb64d305697e0d96d8cd8f54dc lastState: {} name: init ready: true restartCount: 0 state: terminated: containerID: containerd://d9e07f6211ac8ca6e8d2be3246a4ad1f11bb0ff4ee409267b1478d4095a3d76e exitCode: 0 finishedAt: "2021-06-07T05:49:06Z" reason: Completed startedAt: "2021-06-07T05:49:06Z" phase: Succeeded podIP: xx.xxx.xx.xxx podIPs: - ip: xx.xxx.xx.xxx qosClass: Guaranteed startTime: "2021-06-07T05:48:56Z" ```

kubeapi logs for the specific pod, after GC deleted

``` Line 93498: I0607 05:44:01.061919 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="4.307081ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 93504: I0607 05:44:01.083054 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="6.072756ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 96396: I0607 05:44:07.959286 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.735123ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 96398: I0607 05:44:07.963466 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="2.875989ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 98838: I0607 05:44:15.101894 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="5.551534ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 98842: I0607 05:44:15.107091 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.054763ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 101307: I0607 05:44:22.518742 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.185779ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 101308: I0607 05:44:22.523939 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.552025ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 104311: I0607 05:44:32.181289 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="4.787318ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 104314: I0607 05:44:32.186735 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.901063ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 107911: I0607 05:44:43.041448 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.988516ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 107914: I0607 05:44:43.049065 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="5.332276ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 111514: I0607 05:44:54.041214 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="4.645752ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 111516: I0607 05:44:54.049404 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="6.016609ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 115449: I0607 05:45:04.959947 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.89137ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 115451: I0607 05:45:04.966373 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.712336ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 119294: I0607 05:45:15.938505 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.04478ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 119298: I0607 05:45:15.943743 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.63969ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 122603: I0607 05:45:26.839197 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.732936ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 122606: I0607 05:45:26.845308 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="4.129984ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 125914: I0607 05:45:37.838542 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.042851ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 125915: I0607 05:45:37.843363 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.476467ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 128978: I0607 05:45:48.858870 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.177452ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 128980: I0607 05:45:48.866528 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="5.4523ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 132098: I0607 05:45:59.838287 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.835099ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 132099: I0607 05:45:59.842245 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="2.712481ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 135628: I0607 05:46:10.839330 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.427182ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 135630: I0607 05:46:10.845234 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="4.008394ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 138417: I0607 05:46:21.461412 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.398376ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 138432: I0607 05:46:21.482859 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="4.550808ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 140991: I0607 05:46:32.451661 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.406436ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 141002: I0607 05:46:32.467094 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="2.837747ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 143501: I0607 05:46:43.471797 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.835479ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 143511: I0607 05:46:43.486475 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.290778ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 145925: I0607 05:46:54.480178 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.849655ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 145933: I0607 05:46:54.493632 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.061742ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 148738: I0607 05:47:05.472380 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.507113ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 148746: I0607 05:47:05.486004 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="4.203854ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 151370: I0607 05:47:16.477798 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.356976ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 151377: I0607 05:47:16.491222 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="2.957059ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 153628: I0607 05:47:27.497178 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.148171ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 153634: I0607 05:47:27.510601 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="4.010083ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 155830: I0607 05:47:38.501486 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.087818ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 155835: I0607 05:47:38.514424 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="6.170734ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 157898: I0607 05:47:49.502341 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.079213ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 157901: I0607 05:47:49.509183 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.886179ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 159991: I0607 05:48:00.495444 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.961219ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 159994: I0607 05:48:00.503425 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="4.212791ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 162623: I0607 05:48:11.497578 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.549072ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 162625: I0607 05:48:11.503270 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/events/loops-sequence-xldf9-3697300632.16863680752a0822" latency="3.858383ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 164686: I0607 05:48:22.499687 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.867203ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 166594: I0607 05:48:33.506002 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.114745ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 168446: I0607 05:48:44.506576 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.515534ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=200 Line 170179: I0607 05:48:55.510095 13 httplog.go:89] "HTTP" verb="POST" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/binding" latency="2.75411ms" userAgent="kube-scheduler/v1.19.9 (linux/amd64) kubernetes/ec68c70/scheduler" srcIP="[::1]:60008" resp=201 Line 170338: I0607 05:48:56.171243 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="17.519544ms" userAgent="Google-Stackdriver-Influx" srcIP="xxx.xx.xx.xxx:56726" resp=200 Line 170339: I0607 05:48:56.182133 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.002155ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 170342: I0607 05:48:56.188159 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="4.781902ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 172582: I0607 05:49:06.741495 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.569512ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 172586: I0607 05:49:06.748874 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="5.313777ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 173639: I0607 05:49:13.821429 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.841856ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 173641: I0607 05:49:13.828535 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="5.098325ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 179169: I0607 05:49:53.332236 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="7.627443ms" userAgent="argoexec/v0.0.0 (linux/amd64) kubernetes/$Format/argo-workflows/v3.1.0-rc12 argo-executor/emissary" srcIP="xxx.xx.xx.xxx:60884" resp=200 Line 179279: I0607 05:49:53.741039 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.930989ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 179283: I0607 05:49:53.752605 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="8.806024ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 180133: I0607 05:49:59.810859 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.053661ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 180134: I0607 05:49:59.818287 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="5.822876ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 180238: I0607 05:50:00.830411 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.194522ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 180243: I0607 05:50:00.837755 13 httplog.go:89] "HTTP" verb="PATCH" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632/status" latency="5.487149ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=200 Line 181750: I0607 05:50:06.849074 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="30.993566ms" userAgent="Google-Stackdriver-Influx" srcIP="xxx.xx.xx.xxx:50979" resp=200 Line 200641: I0607 05:52:31.026510 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="21.58633ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65372" resp=200 Line 201797: I0607 05:52:42.081364 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.14329ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65360" resp=200 Line 207588: I0607 05:53:41.128139 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="17.590736ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65280" resp=200 Line 207949: I0607 05:53:45.163313 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.372734ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65318" resp=200 Line 208078: I0607 05:53:47.258213 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.141938ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65340" resp=200 Line 208214: I0607 05:53:49.086029 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.478885ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65352" resp=200 Line 208334: I0607 05:53:50.688009 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.498375ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65370" resp=200 Line 208482: I0607 05:53:52.590149 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.13228ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65290" resp=200 Line 208598: I0607 05:53:54.225233 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="4.134164ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65304" resp=200 Line 208760: I0607 05:53:55.959426 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.677028ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65322" resp=200 Line 208874: I0607 05:53:57.659416 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.6049ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65342" resp=200 Line 208985: I0607 05:53:59.263991 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.810059ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65356" resp=200 Line 209144: I0607 05:54:00.831795 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.597453ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65284" resp=200 Line 209238: I0607 05:54:02.344453 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.179462ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65288" resp=200 Line 209592: I0607 05:54:03.802760 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="2.081122ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65294" resp=200 Line 209876: I0607 05:54:05.205744 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="7.716583ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65298" resp=200 Line 222934: I0607 05:58:41.358274 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="23.836583ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65372" resp=200 Line 244113: I0607 06:06:23.551534 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="19.889336ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65320" resp=200 Line 244577: I0607 06:06:38.642399 13 httplog.go:89] "HTTP" verb="DELETE" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="8.045981ms" userAgent="workflow-controller/v0.0.0 (linux/amd64) kubernetes/$Format/argo-workflows/v3.1.0-rc12 argo-controller" srcIP="xxx.xx.xx.xxx:52370" resp=200 Line 244578: I0607 06:06:38.647036 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.279905ms" userAgent="kubelet/v1.19.9 (linux/amd64) kubernetes/ec68c70" srcIP="xxx.xx.xx.xxx:58084" resp=404 Line 246454: I0607 06:07:15.390474 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.208219ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65292" resp=404 Line 246647: I0607 06:07:21.360825 13 httplog.go:89] "HTTP" verb="GET" URI="/api/v1/namespaces/argo/pods/loops-sequence-xldf9-3697300632" latency="1.141768ms" userAgent="kubectl/v1.19.11 (linux/amd64) kubernetes/c6a2f08" srcIP="xxx.xx.xx.xxx:65342" resp=404 ```

As you can see, the pod was eventually deleted, but over 15 min after it was considered "completed" (the timing varies, and some pods delete instantly as they complete)

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

ebr commented 3 years ago

I'd like to reopen this issue please. We are seeing the same behaviour with OnPodSuccess. We are on 3.0.7. Previously on 3.0.2 I think it was working, but can no longer confirm as we've upgraded.

Details/symptoms:

Argo v3.0.7
podGC: strategy: OnPodSuccess is defined in the Workflow Controller Configmap
Workflows are submitted both directly and --from a ClusterWorkflowTemplate
Neither Workflows nor the CWFT are overriding the PodGCStrategy via the manifest
Confirmed the resultant Workflows do have the PodGCStrategy applies as per the controller configmap
Workflows are completing successfully, but succeeded Pods are still present on the cluster
They do get deleted eventually, but the timing varies. Unconfirmed, but they seem to be getting deleted more reliably/consistently when the cluster is under heavy load (~15K running pods)
cleaning up pod followed by "Delete pods 404" is seen repeating in the Controller logs, referring to pods from workflows that were manually deleted
pods are being added to the pod_cleanup_queue (as per prometheus metrics), but at a much lower rate than the pod_queue. Pods are being added to the pod_cleanup_queue (many) minutes after the workflows have been completed. Cluster is not under any load at this time.
we are using Postgres for offloading, however the workflows we've most recently tested this with are not sufficiently large for offloading.

Let me know if there's anything else I can do to help troubleshoot.

tscheepers commented 3 years ago

They do get deleted eventually, but the timing varies. Unconfirmed, but they seem to be getting deleted more reliably/consistently when the cluster is under heavy load (~15K running pods)

We have the same issue. Completed pods accumulate under heavy load. We are working with a few workflows each has many pods (10k) completing relatively quickly (1-10 seconds).

I've tried:

Increasing the qps.
Increasing the number of pod_cleanup_workers.

Both do not solve the issues.

argoproj / argo-workflows

PodGC OnPodCompletion strategy does not work in 3.1-rc10 #6051

Summary