argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.87k stars 5.45k forks source link

No-diff Sync Fails #16070

Open antwacky opened 1 year ago

antwacky commented 1 year ago

Checklist:

Describe the bug

I restored a PVC, which added dataSource and dataSourceRef spec fields to the PVC (referring to the source VolumeSnapShot).

This rightly caused ArgoCD to report the application as out of sync, as it is unaware of these dataSource fields.

To work around this, I added the below configuration to the ArgoCD configmap:

  resource.customizations.ignoreDifferences.all: |
    jsonPointers:
      - /spec/dataSource
      - /spec/dataSourceRef
      - /spec/storageClassName

This worked, and now the PVC and application shows as 'Synced'.

However, despite the fact that ArgoCD shows now there is no diff, and the application is in sync, when I click sync, I get sync failed.

To Reproduce

Expected behavior

I expect the application to be in sync, and syncing to be successful.

Version

argocd: v2.8.4+c279299
  BuildDate: 2023-09-13T19:12:09Z
  GitCommit: c27929928104dc37b937764baf65f38b78930e59
  GitTreeState: clean
  GoVersion: go1.20.6
  Compiler: gc
  Platform: linux/amd64

Logs

one or more objects failed to apply, reason: PersistentVolumeClaim "jenkins" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims   core.PersistentVolumeClaimSpec{    ... // 2 identical fields    Resources: {Requests: {s"storage": {i: {...}, s: "150Gi", Format: "BinarySI"}}},    VolumeName: "pvc-6cc09d41-7fd1-40b5-b79f-bfb624aa2f5d", -  StorageClassName: &"vsphere-csi", +  StorageClassName: nil,    VolumeMode: &"Filesystem",    DataSource: &{APIGroup: &"snapshot.storage.k8s.io", Kind: "VolumeSnapshot", Name: "jenkins-20-10-2023-15-19-48"},    DataSourceRef: &{APIGroup: &"snapshot.storage.k8s.io", Kind: "VolumeSnapshot", Name: "jenkins-20-10-2023-15-19-48"},   }
rumstead commented 1 year ago

Have you enabled the respect ignore diff sync option?

antwacky commented 1 year ago

I have indeed, the application has application level ignore diffs which are working.

This is a system level ignore diffs.

The app shows as synced, but clicking sync says failed, although the app is still in sync. I've tried hard refresh and deleting/recreating the application (non-cascade delete).

rumstead commented 1 year ago

Can you share the application spec?

antwacky commented 1 year ago

Sure, here you go:

project: default
source:
  repoURL: 'https://gitlab.com/argocd.git'
  path: ./apps/jenkins
  targetRevision: master
destination:
  server: 'https://kubernetes.default.svc'
  namespace: jenkins
syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - RespectIgnoreDifferences=true
ignoreDifferences:
  - kind: Secret
    jsonPointers:
      - /data/jenkins
rumstead commented 1 year ago

https://github.com/argoproj/argo-cd/issues/12569 looks to be the same issue.

Nasty, but I wonder if you can ignore the entire spec.

antwacky commented 1 year ago

I've tried that an it's still failing to sync...

It's slightly strange because as I've said, ArgoCD reports that there is no differences, and the app is in sync.

So theoretically, clicking sync shouldn't actually do anything.

project: default
source:
  repoURL: 'https://gitlab.com/argocd.git'
  path: ./apps/jenkins
  targetRevision: master
destination:
  server: 'https://kubernetes.default.svc'
  namespace: jenkins
syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - RespectIgnoreDifferences=true
ignoreDifferences:
  - kind: Secret
    jsonPointers:
      - /data/jenkins
  - kind: PersistentVolumeClaim
    jsonPointers:
      - /spec
antwacky commented 1 year ago

Looking further at the error, it actually appears that argo is complaining about the storage class name field:

// 2 identical fields    Resources: {Requests: {s"storage": {i: {...}, s: "150Gi", Format: "BinarySI"}}},    VolumeName: "pvc-6cc09d41-7fd1-40b5-b79f-bfb624aa2f5d", -  StorageClassName: &"vsphere-csi", +  StorageClassName: nil

I've tried ignoring this too, to no avail.

antwacky commented 1 year ago

I've managed to resolve this by adding the storageClassName explicitly within the helm chart values, so ArgoCD now syncs successfully.

I'm going to leave this open though, as ignoring the storageClassName did not work.

  - kind: PersistentVolumeClaim
    jsonPointers:
      - /spec/storageClassName

Neither did ignoring the entire PVC spec.

And as mentioned previously, ArgoCD reports that the individual PVC resource is synced, so syncing should not fail as there is no work to do.

jMarkP commented 8 months ago

We have a similar issue, but with the volumeName field instead. This causes issues anytime there's a change to the manifest (e.g. a new helm chart version label) as K8s refuses to update the volumeName field.

After some digging and comparing an Application that we can sync successfully to one we can't (both using the same underlying Helm chart and very similar PVC definitions) I think I might have found the issue, and (spoilers) I think it's the same as https://github.com/argoproj/argo-cd/pull/13965#issuecomment-1585312292

In our case ArgoCD is adopting resources previously created with client-side kubectl apply, and I can see in the failing Application that the argocd-controller has claimed ownership of the volumeName field:

  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:kubectl.kubernetes.io/last-applied-configuration': {}
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
            'f:volume.beta.kubernetes.io/storage-provisioner': {}
          'f:finalizers':
            .: {}
            'v:"kubernetes.io/pvc-protection"': {}
          'f:labels':
            .: {}
            'f:app': {}
            'f:argocd.argoproj.io/instance': {}
            'f:chartVersion': {}
            'f:instance': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              .: {}
              'f:storage': {}
          'f:storageClassName': {}
          'f:volumeMode': {}
          'f:volumeName': {}
      manager: argocd-controller
      operation: Apply
      time: '2024-03-12T10:06:07Z'
  ... other managers ...

Whereas in the working Application, that field is still managed by the kube-controller-manager:

  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            'f:app': {}
            'f:argocd.argoproj.io/instance': {}
            'f:chartVersion': {}
            'f:instance': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              'f:storage': {}
          'f:storageClassName': {}
      manager: argocd-controller
      operation: Apply
      time: '2024-03-12T09:16:03Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
          'f:finalizers':
            .: {}
            'v:"kubernetes.io/pvc-protection"': {}
        'f:spec':
          'f:volumeName': {}
        'f:status':
          'f:accessModes': {}
          'f:capacity': {}
          'f:phase': {}
      manager: kube-controller-manager
      operation: Update
      time: '2021-05-10T11:16:02Z'
  ... other managers ...

So presumably when we issue a server-side apply in ArgoCD, K8s added all those fields to Argo's ownership, and now when Argo applies a patch to that resource, K8s treats a missing managed field as having its default value of empty string.

See https://kubernetes.io/docs/reference/using-api/server-side-apply/#field-management

If you remove a field from a manifest and apply that manifest, Server-Side Apply checks if there are any other field managers that also own the field. If the field is not owned by any other field managers, it is either deleted from the live object or reset to its default value, if it has one. The same rule applies to associative list or map items.

So, because our Helm chart doesn't set that volumeName field, but Argo manages it, K8s tries to patch it with empty string, and the controller rejects it.

That tallies with the sync error I see:

error when patching \"/dev/shm/1766200245\": PersistentVolumeClaim \"pvc-name\" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
  core.PersistentVolumeClaimSpec{
  AccessModes:      {\"ReadWriteOnce\"},
  Selector:         nil,
  Resources:        {Requests: {s\"storage\": {i: {...}, s: \"100Gi\", Format: \"BinarySI\"}}},
- VolumeName:       \"pvc-...[GUID]...\",
+ VolumeName:       \"\",
  StorageClassName: &\"cinder\",
  VolumeMode:       &\"Filesystem\",
  ... // 2 identical fields
  }" syncId=00067-pHyze task="Sync/0 resource /PersistentVolumeClaim:my-namespace/pvc-name obj->obj (,,)"

We're going to try to resolve this by patching out f:volumeName from that resource's managedFields.

Based on https://github.com/kubernetes/kubectl/issues/1337 it sounds like this is expected behaviour on the K8s side, so maybe it needs some warning docs on the ArgoCD side to warn about adopting old client-side applied resources into server-side-applied Applications?


Edited to add, we're running ArgoCD v2.10.1, deploying to a K8s cluster on v1.24

jMarkP commented 8 months ago

Ah, some new information - this happens for us even without server-side-apply getting involved. We believe this happens because - before we used ArgoCD to manage this deployment - the previous team injected the existing volumeName of a PVC into the kubectl apply:

kubectl -n ${instance.namespace} get pvc ${instance.name} -o json 2>/dev/null | jq -r .spec.volumeName || echo ''
helm template ... --set volumeName="${volumeName} ... | kubectl apply -f -

And that means that volumeName appears in the last-applied-configuration annotation which explains why a subsequent sync fails

clementnuss commented 3 months ago

thanks a lot @jMarkP ! removing the last-applied-configuration made ArgoCD happy đŸ™ƒ

maybe this should be mentioned in the doc?