GoogleCloudPlatform / k8s-config-connector

GCP Config Connector, a Kubernetes add-on for managing GCP resources
https://cloud.google.com/config-connector/docs/overview
Apache License 2.0
898 stars 231 forks source link

ArgoCD - ConfigConnector conflict over DNSRecordSets? #837

Open peter-tar opened 1 year ago

peter-tar commented 1 year ago

Checklist

Bug Description

ArgoCD will create, but then fail to sync DNSRecordSet resources. It claims that it can't apply changes due to an immutable field even though there have been no changes to the resource so there should be nothing to apply. I am uncertain whether the issue lies with ArgoCD or Config Connector.

These DNSRecordSets aren't being changed. We don't change them. Argocd sees them as synced and green. There is nothing in the diff. If I force replace the resource, there are no differences between the old and new resource except for a few computed fields (metadata.creationTimestamp, metadata.resourceVersion, metadata.uid). Yet they keep failing to sync.

Additional Diagnostic Information

I am using ArgoCD (2.7.6) to deploy Config Connector resources on a GKE cluster. I've had problems with the DNSRecordSet resource for a couple of years now through many different GKE/Argo/CC versions.

Based on the error message (see below) I thought it could because CC and ArgoCD are fighting over the resource, even though I see no difference in the resource manifests ie. no sign of any unspecified but added fields. https://cloud.google.com/config-connector/docs/concepts/ignore-unspecified-fields#resolve_fighting_between_config_management_tools_and However setting the state-into-spec annotation to absent doesn't solve the problem, though it changes the error message (see below.)

I also raised an issue on the ArgoCD side as well, but I was advised on Slack that there is a good chance that it could be a Config Connector problem. https://github.com/argoproj/argo-cd/issues/14426

Kubernetes Cluster Version

Client Version: v1.27.2 Kustomize Version: v5.0.1 Server Version: v1.25.10-gke.1200

Config Connector Version

1.96.0

Config Connector Mode

namespaced mode (default)

Log Output

error when replacing "/dev/shm/3513087848": admission webhook "deny-immutable-field-updates.cnrm.cloud.google.com" denied the request: annotation cnrm.cloud.google.com/state-into-spec is immutable (retried 5 times).

with state-into-spec set to absent:

one or more objects failed to apply, reason: error when replacing "/dev/shm/952324908": admission webhook "deny-immutable-field-updates.cnrm.cloud.google.com" denied the request: error validating container annotations: cannot make changes to container annotation cnrm.cloud.google.com/project-id

Steps to reproduce the issue

  1. Create a DNSRecordSet resource using the template below (using rrdatasRefs instead of rrdatas makes no difference)
  2. Let Argo apply it
  3. When done, try to manually sync the resource or wait for the next automatic sync
  4. Find your Resource failing to sync with the following error message:
error when replacing "/dev/shm/3513087848": admission webhook "deny-immutable-field-updates.cnrm.cloud.google.com" denied the request: annotation cnrm.cloud.google.com/state-into-spec is immutable (retried 5 times).

Using the cnrm.cloud.google.com/state-into-spec: absent annotation on the resource changes the error message to:

one or more objects failed to apply, reason: error when replacing "/dev/shm/952324908": admission webhook "deny-immutable-field-updates.cnrm.cloud.google.com" denied the request: error validating container annotations: cannot make changes to container annotation cnrm.cloud.google.com/project-id

YAML snippets

apiVersion: dns.cnrm.cloud.google.com/v1beta1
kind: DNSRecordSet
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Replace=true # This has been added as a potential workaround/remedy, but it's not helping. We had the same problem without
  labels:
    app.kubernetes.io/instance: <omitted>
    app.kubernetes.io/name: <omitted>
    app.kubernetes.io/part-of: <omitted>
  name: dns-record-1
  namespace: development
spec:
  managedZoneRef:
    external: <omitted>
  name: <omitted>
  rrdatas:
    - <omitted>
  ttl: 300
  type: A
diviner524 commented 1 year ago

It looks like this is caused by an update to metadata.annotations.

  1. How does ArgoCD treat the difference in metadata.annotations? Note upon Config Connector resource creation, it will update the resource's metadata.annotations and add a few default values. Is ArgoCD trying to revert these changes?

  2. Also it seems you are relying on a K8s namespace level Scope-defining annotation. Can you try explicitly specify it in your resource YAML, instead of relying on the default value from K8s namespace, and see if it helps?

Basically the idea is to try specifying all defaulted annotation values explicitly in the YAML to prove the theory in 1. You can do a kubectl get -oyaml to find all the default annotation values added by Config Connector.

peter-tar commented 1 year ago

It looks like this is caused by an update to metadata.annotations.

1. How does ArgoCD treat the difference in `metadata.annotations`? Note upon Config Connector resource creation, it will update the resource's `metadata.annotations` and add a few default values. Is ArgoCD trying to revert these changes?

2. Also it seems you are relying on a K8s namespace level [Scope-defining annotation](https://cloud.google.com/config-connector/docs/how-to/organizing-resources/overview#scope-defining_annotation). Can you try explicitly specify it in your resource YAML, instead of relying on the default value from K8s namespace, and see if it helps?

Basically the idea is to try specifying all defaulted annotation values explicitly in the YAML to prove the theory in 1. You can do a kubectl get -oyaml to find all the default annotation values added by Config Connector.

Thank you very much for your help! Solution number 2 doesn't seem to help.

Number 1 however seems to have worked. The resource now syncs correctly.

However I am still baffled. We have quite a number of Config connector resources deployed by Argo. Pubsub topics, schemas, buckets, bigquery datasets and tables, iam roles, etc. Some in the same Application, some in others (but they are all generated via the same ApplicationSet template, so they are basically the same).

After a quick inspection it looks like a good chunk of them set these exact metadata.annotations yet we only ever had a problem with DNSRecordSets. Never with anything else.
I wonder what makes the DNSRecordSet special?

diviner524 commented 1 year ago

There shouldn't be a difference between these different CRDs, at least the behavior on metadata.annotations is similar.

How did you solve the problem? Did you add all the default annotations into your YAML before applying through ArgoCD and then it syncs?

peter-tar commented 1 year ago

I added the following three annotations:

    cnrm.cloud.google.com/management-conflict-prevention-policy: none
    cnrm.cloud.google.com/project-id: <project-id>
    cnrm.cloud.google.com/state-into-spec: merge

I still think there must be something odd going on with this resource type, since the other config connector resources don't have these default annotations specified and they sync just fine.

Also, on a successful ArgoCD sync other resources are marked as unchanged:

pubsubtopic.pubsub.cnrm.cloud.google.com/topic-name unchanged

While these DNSRecordSets are marked as replaced:

dnsrecordset.dns.cnrm.cloud.google.com/record-name replaced

But the DNSRecordSet resource in the cluster is in-fact untouched so I am happy with this solution/workaround.