Closed rwong2888 closed 3 years ago
Please complete more relevant information, e.g., coredns or pod's logs or , etc.
Are you applying the helm chart directly with helm on command line, or via the argocd interface/api?
@chrisohaver I'm using app of apps pattern with declarative application in argocd to create coredns via helm chart and it is on a sync loop, deleting the cm, svc, deploy and syncing over and over.
Application yaml after helm template completed
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: operations.coredns.staging
namespace: argocd
labels:
metaLabel: coredns
app: coredns
team: operations
provider: gcp
project: development
cluster: playground-west
region: us-west1
environment: staging
annotations:
argocd.argoproj.io/sync-wave: "10"
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
helm:
releaseName: staging
valueFiles:
- values.yaml
repoURL: https://xxxx.com/Operations-argocd-applications.git
targetRevision: HEAD
path: applications/operations/coredns/staging
destination:
name: playground-west
namespace: kube-system
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
backoff:
duration: 5s
factor: 2
maxDuration: 3m
limit: 5
syncOptions:
- PrunePropagationPolicy=foreground
- ApplyOutOfSyncOnly=true
coredns chart.yaml (located in applications/operations/coredns/staging)
apiVersion: v2
name: coredns
version: 0.0.1
dependencies:
- name: coredns
version: 1.16.0
repository: https://coredns.github.io/helm
coredns values.yaml (does not work with blank/default values either)
coredns:
autoscaler:
enabled: true
I have a "I skimmed over the brochure" level knowledge of argocd (i.e. never heard of it before I saw this issue). So I doubt I can be of much help. This does seem like an argocd issue though, and not a helm chart issue.
Is it possibly related to this issue? https://github.com/argoproj/argo-cd/issues/4614
I don't think so @chrisohaver . I have a feeling it is either due to labels, or some of the fields get reordered or mutates such that it is not matching the manifest after helm templating, but I am unable to discern myself what is different.
Ah. I see you had already asked for help on the argo-cd project: argoproj/argo-cd#6754 ... and that kind of got dropped, so you are seeking help here?
Are any of the following the case?
- There is a bug in the manifest, where it contains extra/unknown fields from the actual K8s spec. These extra fields would get dropped when querying Kubernetes for the live state, resulting in an OutOfSync status indicating a missing field was detected.
- The sync was performed (with pruning disabled), and there are resources which need to be deleted.
- A controller or mutating webhook is altering the object after it was submitted to Kubernetes in a manner which contradicts Git.
- A Helm chart is using a template function such as randAlphaNum, which generates different data every time helm template is invoked.
- For Horizontal Pod Autoscaling (HPA) objects, the HPA controller is known to reorder spec.metrics in a specific order. See kubernetes issue #74099. To work around this, you can order spec.metrics in Git in the same order that the controller prefers.
(from https://argoproj.github.io/argo-cd/user-guide/diffing/#diffing-customization )
I think it has to do with the labels in the coredns helm template vs argocd trying to override those labels and then the live and the manifest not jiving together @chrisohaver
Ok. Which labels are affected?
Well when I remove all labels from the template.... I am left with argo injecting and it is stable
app.kubernetes.io/instance: operations.coredns.staging
However, when I only remove app.kubernetes.io/instance
, it is still sync looping.
It may be a combination of labels... Any ideas @chrisohaver ?
Hmm. Are there certain labels that a helm chart is not permitted to define if it is to be compatible with Argo CD? These would certainly be documented in Argo CD docs somewhere.
https://argoproj.github.io/argo-cd/user-guide/helm/#helm-release-name
Important notice on overriding the release name Please note that overriding the Helm release name might cause problems when the chart you are deploying is using the app.kubernetes.io/instance label. ArgoCD injects this label with the value of the Application name for tracking purposes. So when overriding the release name, the Application name will stop being equal to the release name. Because ArgoCD will overwrite the label with the Application name it might cause some selectors on the resources to stop working. In order to avoid this we can configure ArgoCD to use another label for tracking in the ArgoCD configmap argocd-cm.yaml - check the lines describing application.instanceLabelKey.
I had application.instanceLabelKey
commented out in my config. Let me retest.
Still sync looping. There must be some labels interfering still.
@chrisohaver, I've found the issue is related to the label kubernetes.io/cluster-service: "true"
in the configmap, service, and deployment.
However, oddly enough it is fine in the cluster role and cluster role binding.
As per, https://github.com/kubernetes/kubernetes/issues/72757 the label does not serve any purpose.
Can we remove all occurrences of this label in the templates?
Thanks! It's concerning that ArgoCD crashes and burns due to presence of a label that isn't used by anything ... especially considering that the label is "sprinkled across over 100 yaml files in the Kubernetes repo". I guess that some process is removing the labels?
Nevertheless, it looks like the label was deprecated five years ago. So it should be removed IMO.
Nevertheless, it looks like the label was deprecated five years ago. So it should be removed IMO.
Correction - Add-on Manager deprecated its use of the label, but it did not necessarily deprecate the label itself. Other processes may still be using it. Specifically, kubectl
still uses it in kubectl cluster-info
, but only for Services IIUC.
So, at least for the Service, it seems it is still used.
I don't know if the label is still actively used elsewhere.
@chrisohaver found a workaround with kustomized-helm
Reference: https://github.com/argoproj/argo-cd/issues/6958#issuecomment-918469031
argocd-cm
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
data:
configManagementPlugins: |-
- name: kustomized-helm
init:
command: ["sh", "-c"]
args: ["helm repo add ${helm_dependency_name} ${helm_dependency_repo} && helm dependency update ${helm_base}"]
generate:
command: ["sh", "-c"]
args: ["helm template ${helm_dependency_name} ${helm_base} ${helm_args} > ${helm_base}/all.yaml && kustomize build"]
application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coredns
namespace: argocd
spec:
source:
repoURL: {{ .Values.repoURL }}
targetRevision: main
path: applications/coredns
plugin:
name: kustomized-helm
env:
- name: helm_args
value: -n kube-system -f base/values.yaml
- name: helm_dependency_name
value: coredns
- name: helm_dependency_repo
value: https://coredns.github.io/helm
- name: helm_base
value: base
destination:
name: {{ .Values.destinationName }}
namespace: kube-system
applications/coredns/base/Chart.yaml
apiVersion: v2
name: coredns
version: 0.0.1
dependencies:
- name: coredns
version: 1.16.3
repository: https://coredns.github.io/helm
applications/coredns/overlays/remove-label.yaml
- op: remove
path: "/metadata/labels/kubernetes.io~1cluster-service"
applications/coredns/kustomization.yaml
kind: Kustomization
bases:
- base/all.yaml
patchesJson6902:
- target:
version: v1
kind: ConfigMap
name: coredns-coredns
path: overlays/remove-label.yaml
- target:
version: v1
kind: Deployment
name: coredns-coredns
path: overlays/remove-label.yaml
- target:
version: v1
kind: Service
name: coredns-coredns
path: overlays/remove-label.yaml
- target:
version: v1
kind: ConfigMap
name: coredns-coredns-autoscaler
path: overlays/remove-label.yaml
- target:
version: v1
kind: Deployment
name: coredns-coredns-autoscaler
path: overlays/remove-label.yaml
namespace: kube-system
@rwong2888, Cool. Thanks for following up with the detailed workaround! I appreciate it!
No problem @chrisohaver . Took a while to figure it out. Hopefully, no one else runs into this issue.
I tried deploying this helm chart without modifying any values.yaml and it deletes itself. I am not able to differentiate what is different between live and what's in git, but I have determined that configmap, service, and deployment each triggers the issue.