PostSync hook runs while Deployment pods from previous ReplicaSet still exist

glasser commented 8 months ago

Checklist:

[x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
[x] I've included steps to reproduce the bug.
[x] I've pasted the output of argocd version.

Describe the bug

We have some ArgoCD apps which contain (among other things) a Deployment, a Service, and a Job that is a PostSync hook. The Job uses curl to connect to the Deployment via the Service URL to get some information from it (a GraphQL schema) and send it somewhere.

Our intention is that the Job should only talk to the newly-deployed pods, not pods from the previous ReplicaSet, so that the information it gets is about the current deploy, not the previous one.

However, we are seeing that in practice the Job occasionally gets responses from the previous ReplicaSet. We can tell because we log the headers received by curl and the Kubernetes-Pod-Name in the response is from the previous ReplicaSet (this is something our servers include in headers).

This is surprising. My understanding is that PostSync hooks only run after all the resources in the app are Healthy, and the health code for Deployments specifically wait for previous replicas to terminate before becoming Healthy (with code borrowed from kubectl rollout status). And yet this definitely happens.

To Reproduce

Reproduction available at https://github.com/glasser/argo-cd-issue-17408-reproduction

This is a chart with a values.yaml containing big: true, and a single template file containing

{{ $token := ternary "big" "small" .Values.big }}
---
apiVersion: v1
kind: Service
metadata:
  name: reproduction
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    name: reproduction
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: reproduction
spec:
  replicas: 10
  selector:
    matchLabels:
      name: reproduction
  template:
    metadata:
      labels:
        name: reproduction
    spec:
      {{- if .Values.big }}
      terminationGracePeriodSeconds: 120
      {{- end }}
      containers:
        - name: reproduction
          image: nginx:1.25.3
          volumeMounts:
          - name: reproduction
            mountPath: /etc/nginx/templates/default.conf.template
            subPath: default.conf.template
          env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: TOKEN
            value: {{ $token }}
          {{- if .Values.big }}
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - sleep 120
          {{- end }}
          livenessProbe:
            failureThreshold: 6
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 1
            periodSeconds: 1
            timeoutSeconds: 5
          readinessProbe:
            failureThreshold: 6
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 1
            periodSeconds: 1
            timeoutSeconds: 5
      volumes:
      - name: reproduction
        configMap:
          name: reproduction
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: reproduction
data:
  default.conf.template: |
      server {
      listen 80;
      location / {
          add_header Content-Type text/plain;
          add_header Kubernetes-Pod-Name "${POD_NAME}";
          return 200 '${TOKEN}';
      }
      }
  token.txt: {{ $token }}
  hook.sh: |
    #!/bin/bash

    set -e

    if ! curl --fail-with-body --silent --show-error http://reproduction/ -o token.txt.out --dump-header header.txt; then
      echo "Failed to download token"
      cat token.txt.out
      exit 1
    fi

    echo "Downloaded token; headers:"
    cat header.txt
    echo

    echo "Hook downloaded token:"
    cat token.txt.out
    echo
    echo "Expected token:"
    cat reproduction/token.txt
    echo

    # Don't fail if different because we don't want the hook to retry.
    cmp token.txt.out reproduction/token.txt || true

---
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
  name: reproduction-hook
spec:
  backoffLimit: 3
  template:
    spec:
      containers:
        - name: hook
          command:
            - /bin/bash
            - reproduction/hook.sh
          image: nginx:1.25.3  # It has curl too!
          volumeMounts:
            - mountPath: /reproduction
              name: reproduction
      restartPolicy: Never
      volumes:
        - name: reproduction
          configMap:
            name: reproduction

Set up an ArgoCD app that installs this Helm chart in some namespace in some cluster. Wait for it to fully sync and become healthy. Now edit the values.yaml file to say big: false, push the change, and sync the app.

It will spin up one pod in the new replica set, start terminating the 10 old pods (which will hang for two minutes in the postStop hook), and then execute the PostSync hook. Most likely (90.9%) the PostSync hook will curl one of the old pods. Look at its logs (eg via the ArgoCD UI); you should see something like:

Downloaded token; headers:
HTTP/1.1 200 OK
Server: nginx/1.25.3
Date: Wed, 06 Mar 2024 04:58:11 GMT
Content-Type: application/octet-stream
Content-Length: 3
Connection: keep-alive
Content-Type: text/plain
Kubernetes-Pod-Name: reproduction-755549c777-bh8zj
Hook downloaded token:
big
Expected token:
small
token.txt.out reproduction/token.txt differ: char 1, line 1

This shows (both from the Kubernetes-Pod-Name header which names a pod in the old replicaset, and from the fact that it downloaded the value "big" served by the old replicaset not "small" served by the new one) that the hook spoke to the old pods.

Expected behavior

After the PostSync hook runs, the headers displayed in the pod logs should never have a Kubernetes-Pod-Name from the previous replicaset and should only download the new "small" token.

Version

We don't use the argocd CLI but our server is running v2.10.1+a79e0ea.

Workaround

We can add a long sleep to the beginning of our PostSync hook.
If we ever detect that the problem happened, clicking Sync on the app will rerun the hook which will properly fetch and publish the correct data.

glasser commented 8 months ago

I updated the description with a full self-contained reproduction.

I think the issue is basically the same as https://github.com/kubernetes/kubernetes/issues/110171 because the gitops-engine code in question is ported from the kubectl rollout status command which reads the same deployment fields: there is no information available directly inside the Deployment YAML about terminating pods in old replica sets :(

I'm not even sure what we could do here as a reliable workaround (other than the unreliable workaround of sleeping more in the hook). eg, I don't think there's anything obvious I can put in a custom health Lua script — it will only have access to the Deployment resource itself and not things that it owns, right?

I suppose the PostSync hook could run a kubectl command like kubectl describe deploy reproduction | perl -lne 'print $1 if /NewReplicaSet:\s*(\S+) /' to figure out the current replicaset, and then loop until the fetched data's header indicates it's from a new set. (Or loop some sort of kubectl get po --field-selector status.phase=Terminating --selector name=reproduction until it returns nothing.) Annoying to need kubectl on the image though.

andrii-korotkov-verkada commented 1 week ago

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and let us know if the issue is still present, please?

glasser commented 1 week ago

Doesn't look like https://github.com/argoproj/gitops-engine/blob/master/pkg/health/health_deployment.go has been touched in 2 years, so sure seems like it would still be present.

jdoylei commented 1 day ago

I'm interested in this topic as well. We don't have a PostSync hook, but we are using Argo CD Notifications to request a webhook service for trigger on-deployed, and when the webhook runs, we find that there are still Terminating pods from the old ReplicaSet. This makes it difficult for the webhook to do what we want, because it consumes the REST API /applications to consult status.summary.images and find out what the new set of running images are. Since on-deployed is using the health check to determine when to fire, it fires when there are still Terminating pods, and status.summary.images still contains images from the old Terminating pods.

A workaround that seems to work, is to also consume REST API /applications/{app}/resource-tree and look for Pods with Argo CD Health Status "Progressing" (which will include those Terminating pods). In our webhook, we can then respond with a 5xx to force Argo CD Notifications to retry after a wait, during which the pods can terminate (though maybe the webhook retries is not the ultimate solution - it seems like when Argo CD Notifications retries a webhook for more than 60 seconds, it reports the next webhook request as failed with "EOF").

argoproj / argo-cd

PostSync hook runs while Deployment pods from previous ReplicaSet still exist #17408