argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18.04k stars 5.51k forks source link

Improve error handling for ComparisonError failures due to duplicate keys #4071

Open lucinvitae opened 4 years ago

lucinvitae commented 4 years ago

Summary

Improve error handling for ComparisonError failures that are due to a change in a map which introduces duplicate keys

Motivation

Having duplicate keys in a mapping (e.g. a Helm chart which templates the env: setting for a Kubernetes container) causes ArgoCD syncs to fail with ComparisonError errors that appear as follows:

The order in patch list … doesn't match $setElementOrder list: …

Here's a full example from a recent deployment issue our team faced:

The order in patch list: [map[name:INVITAE_DATA_ENV value:development-msk] map[name:INVITAE_DATA_AUTH_CERT value:/kafka.crt] map[name:INVITAE_DATA_AUTH_CERT value:/kafka.crt] map[name:INVITAE_DATA_AUTH_KEY value:/kafka.key] map[name:VAULT_PATH valueFrom:map[secretKeyRef:map[key:path name:vault-token]]] map[name:VAULT_TOKEN valueFrom:map[secretKeyRef:map[key:token name:vault-token]]] map[name:MONGODB_PASSWORD valueFrom:map[secretKeyRef:map[key:mongodb-password name:mongodb-persistent]]]] doesn't match $setElementOrder list: [map[name:PYTHONPATH] map[name:KFSERVING_URL] map[name:INVITAE_DATA_ENV] map[name:INVITAE_KAFKA_NOTIFICATIONS] map[name:INVITAE_DATA_AUTH_CERT] map[name:INVITAE_DATA_AUTH_KEY] map[name:CELERY_BROKER] map[name:JOB_TRACKER] map[name:VAULT_ADDR] map[name:VAULT_PATH] map[name:VAULT_TOKEN] map[name:MONGODB_HOST] map[name:MONGODB_USERNAME] map[name:MONGODB_DATABASE] map[name:MONGODB_PASSWORD] map[name:PYTHONPATH] map[name:KFSERVING_URL] map[name:INVITAE_DATA_ENV] map[name:INVITAE_KAFKA_NOTIFICATIONS] map[name:INVITAE_DATA_AUTH_CERT] map[name:INVITAE_DATA_AUTH_KEY]]

The app may be in a healthy status, with “Sync OK” message for each sync, but the section in the ArgoCD UI that shows whether the app is up to date with the repo shows Unknown when this happens.

This error message can be quite misleading, suggesting the order is invalid. In our case, due to many in-flight changes and the inability to identify which change was causing the failure, our team was manually deleting and recreating a deployment for over a week to work around the error before spotting the duplicate keys and resolving the sync issue.

Here is a section of our helm chart (modified for brevity) containing the original bug with duplicate keys, for reference:

      containers:
        - name: {{ .Chart.Name }}
          image: {{ .Values.worker.image.repository }}:{{ .Values.tag }}
          imagePullPolicy: {{ .Values.worker.image.pullPolicy }}
          env:
          {{- toYaml .Values.worker.env | nindent 12 }}
            - name: BROKER_URL
              value: "redis://{{ include "app.fullname" . }}"
          ... many more lines ...
          {{- if .Values.worker.env }}
            {{- toYaml .Values.worker.env | nindent 12 }}
          {{- end }}
          volumeMounts:
            ...

(See how .Values.worker.env is included twice)

See this thread in the ArgoCD Slack channel for more info: https://argoproj.slack.com/archives/CASHNF6MS/p1596812671404600

Proposal

The message could read something more informative, like duplicate keys in mapping is blocking ability to compare/diff configs instead. This would point developers immediately to duplication as the root cause, instead of trying to decipher the order of keys in a long list.

lucinvitae commented 4 years ago

Also, as an alternative or precursor to better error handling, perhaps it would be useful to add some documentation in a troubleshooting guide for sync issues about how duplicate keys can block syncs. (happy to help do this if pointed to the right doc)

hojongs commented 2 years ago

I suffered from this issue and took a whole day to figure out the reason. the description of this issue was very informative and I hope the comparison error message to be more understandable.

CodebashingDevOps commented 2 years ago

This thread saved me a few years of my life.. wasted 2 days trying to figure out why argo won't sync. +1 on this issue

jsoref commented 1 year ago

I'm going to ask someone to at least update the FAQ. Hopefully we can do that this week. Maybe we'll also write a PR to fix the error reporting.

jsoref commented 1 year ago

For the morbidly curious, the error message comes from: https://github.com/kubernetes/apimachinery/blob/b4dea92b265131dae13c1e9ebb0a31821bf5daf8/pkg/util/strategicpatch/patch.go#L1019

I really can't figure out what in ArgoCD triggers that.

jsoref commented 1 year ago

I believe it comes from: https://github.com/argoproj/argo-cd/blob/61f6530e8ab5d6816099edbbb2b555540a9eeda4/util/argo/diff/diff.go#L267