argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.72k stars 5.4k forks source link

PostSync hook runs while Deployment pods from previous ReplicaSet still exist #17408

Open glasser opened 7 months ago

glasser commented 7 months ago

Checklist:

Describe the bug

We have some ArgoCD apps which contain (among other things) a Deployment, a Service, and a Job that is a PostSync hook. The Job uses curl to connect to the Deployment via the Service URL to get some information from it (a GraphQL schema) and send it somewhere.

Our intention is that the Job should only talk to the newly-deployed pods, not pods from the previous ReplicaSet, so that the information it gets is about the current deploy, not the previous one.

However, we are seeing that in practice the Job occasionally gets responses from the previous ReplicaSet. We can tell because we log the headers received by curl and the Kubernetes-Pod-Name in the response is from the previous ReplicaSet (this is something our servers include in headers).

This is surprising. My understanding is that PostSync hooks only run after all the resources in the app are Healthy, and the health code for Deployments specifically wait for previous replicas to terminate before becoming Healthy (with code borrowed from kubectl rollout status). And yet this definitely happens.

To Reproduce

Reproduction available at https://github.com/glasser/argo-cd-issue-17408-reproduction

This is a chart with a values.yaml containing big: true, and a single template file containing

{{ $token := ternary "big" "small" .Values.big }}
---
apiVersion: v1
kind: Service
metadata:
  name: reproduction
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    name: reproduction
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: reproduction
spec:
  replicas: 10
  selector:
    matchLabels:
      name: reproduction
  template:
    metadata:
      labels:
        name: reproduction
    spec:
      {{- if .Values.big }}
      terminationGracePeriodSeconds: 120
      {{- end }}
      containers:
        - name: reproduction
          image: nginx:1.25.3
          volumeMounts:
          - name: reproduction
            mountPath: /etc/nginx/templates/default.conf.template
            subPath: default.conf.template
          env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: TOKEN
            value: {{ $token }}
          {{- if .Values.big }}
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - sleep 120
          {{- end }}
          livenessProbe:
            failureThreshold: 6
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 1
            periodSeconds: 1
            timeoutSeconds: 5
          readinessProbe:
            failureThreshold: 6
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 1
            periodSeconds: 1
            timeoutSeconds: 5
      volumes:
      - name: reproduction
        configMap:
          name: reproduction
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: reproduction
data:
  default.conf.template: |
      server {
      listen 80;
      location / {
          add_header Content-Type text/plain;
          add_header Kubernetes-Pod-Name "${POD_NAME}";
          return 200 '${TOKEN}';
      }
      }
  token.txt: {{ $token }}
  hook.sh: |
    #!/bin/bash

    set -e

    if ! curl --fail-with-body --silent --show-error http://reproduction/ -o token.txt.out --dump-header header.txt; then
      echo "Failed to download token"
      cat token.txt.out
      exit 1
    fi

    echo "Downloaded token; headers:"
    cat header.txt
    echo

    echo "Hook downloaded token:"
    cat token.txt.out
    echo
    echo "Expected token:"
    cat reproduction/token.txt
    echo

    # Don't fail if different because we don't want the hook to retry.
    cmp token.txt.out reproduction/token.txt || true

---
apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
  name: reproduction-hook
spec:
  backoffLimit: 3
  template:
    spec:
      containers:
        - name: hook
          command:
            - /bin/bash
            - reproduction/hook.sh
          image: nginx:1.25.3  # It has curl too!
          volumeMounts:
            - mountPath: /reproduction
              name: reproduction
      restartPolicy: Never
      volumes:
        - name: reproduction
          configMap:
            name: reproduction

Set up an ArgoCD app that installs this Helm chart in some namespace in some cluster. Wait for it to fully sync and become healthy. Now edit the values.yaml file to say big: false, push the change, and sync the app.

It will spin up one pod in the new replica set, start terminating the 10 old pods (which will hang for two minutes in the postStop hook), and then execute the PostSync hook. Most likely (90.9%) the PostSync hook will curl one of the old pods. Look at its logs (eg via the ArgoCD UI); you should see something like:

Downloaded token; headers:
HTTP/1.1 200 OK
Server: nginx/1.25.3
Date: Wed, 06 Mar 2024 04:58:11 GMT
Content-Type: application/octet-stream
Content-Length: 3
Connection: keep-alive
Content-Type: text/plain
Kubernetes-Pod-Name: reproduction-755549c777-bh8zj
Hook downloaded token:
big
Expected token:
small
token.txt.out reproduction/token.txt differ: char 1, line 1

This shows (both from the Kubernetes-Pod-Name header which names a pod in the old replicaset, and from the fact that it downloaded the value "big" served by the old replicaset not "small" served by the new one) that the hook spoke to the old pods.

Expected behavior

After the PostSync hook runs, the headers displayed in the pod logs should never have a Kubernetes-Pod-Name from the previous replicaset and should only download the new "small" token.

Version

We don't use the argocd CLI but our server is running v2.10.1+a79e0ea.

Workaround

glasser commented 7 months ago

I updated the description with a full self-contained reproduction.

I think the issue is basically the same as https://github.com/kubernetes/kubernetes/issues/110171 because the gitops-engine code in question is ported from the kubectl rollout status command which reads the same deployment fields: there is no information available directly inside the Deployment YAML about terminating pods in old replica sets :(

I'm not even sure what we could do here as a reliable workaround (other than the unreliable workaround of sleeping more in the hook). eg, I don't think there's anything obvious I can put in a custom health Lua script — it will only have access to the Deployment resource itself and not things that it owns, right?

I suppose the PostSync hook could run a kubectl command like kubectl describe deploy reproduction | perl -lne 'print $1 if /NewReplicaSet:\s*(\S+) /' to figure out the current replicaset, and then loop until the fetched data's header indicates it's from a new set. (Or loop some sort of kubectl get po --field-selector status.phase=Terminating --selector name=reproduction until it returns nothing.) Annoying to need kubectl on the image though.