coredns / helm

Helm Charts for CoreDNS
Apache License 2.0
107 stars 111 forks source link

does not work with argocd #37

Closed rwong2888 closed 3 years ago

rwong2888 commented 3 years ago

I tried deploying this helm chart without modifying any values.yaml and it deletes itself. I am not able to differentiate what is different between live and what's in git, but I have determined that configmap, service, and deployment each triggers the issue.

zouyee commented 3 years ago

Please complete more relevant information, e.g., coredns or pod's logs or , etc.

chrisohaver commented 3 years ago

Are you applying the helm chart directly with helm on command line, or via the argocd interface/api?

rwong2888 commented 3 years ago

@chrisohaver I'm using app of apps pattern with declarative application in argocd to create coredns via helm chart and it is on a sync loop, deleting the cm, svc, deploy and syncing over and over.

Application yaml after helm template completed

---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: operations.coredns.staging
  namespace: argocd
  labels:
    metaLabel: coredns
    app: coredns
    team: operations
    provider: gcp
    project: development
    cluster: playground-west
    region: us-west1
    environment: staging
  annotations:
    argocd.argoproj.io/sync-wave: "10"
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    helm:
      releaseName: staging
      valueFiles:
      - values.yaml
    repoURL: https://xxxx.com/Operations-argocd-applications.git
    targetRevision: HEAD
    path: applications/operations/coredns/staging
  destination:
    name: playground-west
    namespace: kube-system
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    retry:
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
      limit: 5
    syncOptions:
    - PrunePropagationPolicy=foreground
    - ApplyOutOfSyncOnly=true

coredns chart.yaml (located in applications/operations/coredns/staging)

apiVersion: v2
name: coredns
version: 0.0.1
dependencies:
- name: coredns
  version: 1.16.0
  repository: https://coredns.github.io/helm

coredns values.yaml (does not work with blank/default values either)

coredns:
  autoscaler:
    enabled: true
chrisohaver commented 3 years ago

I have a "I skimmed over the brochure" level knowledge of argocd (i.e. never heard of it before I saw this issue). So I doubt I can be of much help. This does seem like an argocd issue though, and not a helm chart issue.

Is it possibly related to this issue? https://github.com/argoproj/argo-cd/issues/4614

rwong2888 commented 3 years ago

I don't think so @chrisohaver . I have a feeling it is either due to labels, or some of the fields get reordered or mutates such that it is not matching the manifest after helm templating, but I am unable to discern myself what is different.

chrisohaver commented 3 years ago

Ah. I see you had already asked for help on the argo-cd project: argoproj/argo-cd#6754 ... and that kind of got dropped, so you are seeking help here?

chrisohaver commented 3 years ago

Are any of the following the case?

  • There is a bug in the manifest, where it contains extra/unknown fields from the actual K8s spec. These extra fields would get dropped when querying Kubernetes for the live state, resulting in an OutOfSync status indicating a missing field was detected.
  • The sync was performed (with pruning disabled), and there are resources which need to be deleted.
  • A controller or mutating webhook is altering the object after it was submitted to Kubernetes in a manner which contradicts Git.
  • A Helm chart is using a template function such as randAlphaNum, which generates different data every time helm template is invoked.
  • For Horizontal Pod Autoscaling (HPA) objects, the HPA controller is known to reorder spec.metrics in a specific order. See kubernetes issue #74099. To work around this, you can order spec.metrics in Git in the same order that the controller prefers.

(from https://argoproj.github.io/argo-cd/user-guide/diffing/#diffing-customization )

rwong2888 commented 3 years ago

I think it has to do with the labels in the coredns helm template vs argocd trying to override those labels and then the live and the manifest not jiving together @chrisohaver

chrisohaver commented 3 years ago

Ok. Which labels are affected?

rwong2888 commented 3 years ago

Well when I remove all labels from the template.... I am left with argo injecting and it is stable

app.kubernetes.io/instance: operations.coredns.staging

However, when I only remove app.kubernetes.io/instance, it is still sync looping. It may be a combination of labels... Any ideas @chrisohaver ?

chrisohaver commented 3 years ago

Hmm. Are there certain labels that a helm chart is not permitted to define if it is to be compatible with Argo CD? These would certainly be documented in Argo CD docs somewhere.

rwong2888 commented 3 years ago

https://argoproj.github.io/argo-cd/user-guide/helm/#helm-release-name

Important notice on overriding the release name Please note that overriding the Helm release name might cause problems when the chart you are deploying is using the app.kubernetes.io/instance label. ArgoCD injects this label with the value of the Application name for tracking purposes. So when overriding the release name, the Application name will stop being equal to the release name. Because ArgoCD will overwrite the label with the Application name it might cause some selectors on the resources to stop working. In order to avoid this we can configure ArgoCD to use another label for tracking in the ArgoCD configmap argocd-cm.yaml - check the lines describing application.instanceLabelKey.

I had application.instanceLabelKey commented out in my config. Let me retest.

rwong2888 commented 3 years ago

Still sync looping. There must be some labels interfering still.

rwong2888 commented 3 years ago

@chrisohaver, I've found the issue is related to the label kubernetes.io/cluster-service: "true" in the configmap, service, and deployment.

However, oddly enough it is fine in the cluster role and cluster role binding.

As per, https://github.com/kubernetes/kubernetes/issues/72757 the label does not serve any purpose.

Can we remove all occurrences of this label in the templates?

chrisohaver commented 3 years ago

Thanks! It's concerning that ArgoCD crashes and burns due to presence of a label that isn't used by anything ... especially considering that the label is "sprinkled across over 100 yaml files in the Kubernetes repo". I guess that some process is removing the labels?

Nevertheless, it looks like the label was deprecated five years ago. So it should be removed IMO.

chrisohaver commented 3 years ago

Nevertheless, it looks like the label was deprecated five years ago. So it should be removed IMO.

Correction - Add-on Manager deprecated its use of the label, but it did not necessarily deprecate the label itself. Other processes may still be using it. Specifically, kubectl still uses it in kubectl cluster-info, but only for Services IIUC.

So, at least for the Service, it seems it is still used.

I don't know if the label is still actively used elsewhere.

rwong2888 commented 3 years ago

@chrisohaver found a workaround with kustomized-helm

Reference: https://github.com/argoproj/argo-cd/issues/6958#issuecomment-918469031

argocd-cm

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
data:
  configManagementPlugins: |-
    - name: kustomized-helm
      init:
        command: ["sh", "-c"]
        args: ["helm repo add ${helm_dependency_name} ${helm_dependency_repo} && helm dependency update ${helm_base}"]
      generate:
        command: ["sh", "-c"]
        args: ["helm template ${helm_dependency_name} ${helm_base} ${helm_args} > ${helm_base}/all.yaml && kustomize build"]

application.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: coredns
  namespace: argocd
spec:
  source:
    repoURL: {{ .Values.repoURL }}
    targetRevision: main
    path: applications/coredns
    plugin:
      name: kustomized-helm
      env:
      - name: helm_args
        value: -n kube-system -f base/values.yaml
      - name: helm_dependency_name
        value: coredns
      - name: helm_dependency_repo
        value: https://coredns.github.io/helm
      - name: helm_base
        value: base
  destination:
    name: {{ .Values.destinationName }}
    namespace: kube-system

applications/coredns/base/Chart.yaml

apiVersion: v2
name: coredns
version: 0.0.1
dependencies:
- name: coredns
  version: 1.16.3
  repository: https://coredns.github.io/helm

applications/coredns/overlays/remove-label.yaml

- op: remove
  path: "/metadata/labels/kubernetes.io~1cluster-service"

applications/coredns/kustomization.yaml

kind: Kustomization

bases:
- base/all.yaml

patchesJson6902:
- target:
    version: v1
    kind: ConfigMap
    name: coredns-coredns
  path: overlays/remove-label.yaml
- target:
    version: v1
    kind: Deployment
    name: coredns-coredns
  path: overlays/remove-label.yaml
- target:
    version: v1
    kind: Service
    name: coredns-coredns
  path: overlays/remove-label.yaml
- target:
    version: v1
    kind: ConfigMap
    name: coredns-coredns-autoscaler
  path: overlays/remove-label.yaml
- target:
    version: v1
    kind: Deployment
    name: coredns-coredns-autoscaler
  path: overlays/remove-label.yaml

namespace: kube-system
chrisohaver commented 3 years ago

@rwong2888, Cool. Thanks for following up with the detailed workaround! I appreciate it!

rwong2888 commented 3 years ago

No problem @chrisohaver . Took a while to figure it out. Hopefully, no one else runs into this issue.