argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.59k stars 5.36k forks source link

Server-Side Diff shows OutOfSync despite ignoreDifferences enabled and slow reconciliation performance #18344

Closed Ezzahhh closed 2 weeks ago

Ezzahhh commented 4 months ago

Checklist:

Describe the bug

I ran into a similar issue (https://github.com/argoproj/argo-cd/issues/11136) so I tried to enable ServerSide Diff through annotation (argocd.argoproj.io/compare-options: ServerSideDiff=true).

Afterwards, I no longer had the Apps marked as "Unknown" and they could sync again. However, I noticed that the Apps would be marked as OutOfSync. Specifically, Keda's ScaledObject /status/health field or lastTransitionTime differences would cause this OutOfSync issue. I also observed that my reconciliation time per Application went from 0.5 second average to over 5 seconds and I seemed to have a permanent kubectl exec pending queue for applies (argocd_kubectl_exec_pending metric); I am guessing that each time a change happens in the App it forces reconciliation and thus serverside apply dry-run?

My suspicion is that the high reconciliation time is affecting the comparison as I can see Apps go from OutOfSync to Sync back and forth on each reconciliation depending on the performance. Is it expected to need more CPU performance for Argo with ServerSide Diff compared to legacy ServerSide Apply diff?

Perhaps enabling ignoreResourceStatusField to all can help solve the OutOfSync issue? I don't know what the implications are from enabling this if there are any side-effects.

Here are my ignoreDifferences. They were configured to avoid reconciliation churn by enabling ignoreDifferencesOnResourceUpdates: true and resource.ignoreResourceUpdatesEnabled: 'true':

      ignoreDifferences:
        - kind: ScaledObject
          jsonPointers:
            - /status
            - /status/health
        - group: discovery.k8s.io
          kind: EndpointSlice
          jsonPointers:
            - /endpoints
            - /metadata/ownerReferences
            - /metadata/annotations/endpoints.kubernetes.io~1last-change-trigger-time
            - /metadata/resourceVersion
        - kind: HorizontalPodAutoscaler
          jsonPointers:
            - /status
            - /metadata/ownerReferences
            - /metadata/annotations/autoscaling.alpha.kubernetes.io~1behavior
            - /metadata/annotations/autoscaling.alpha.kubernetes.io~1conditions
            - /metadata/annotations/autoscaling.alpha.kubernetes.io~1current-metrics
            - /metadata/annotations/autoscaling.alpha.kubernetes.io~1metrics
            - /spec
        - kind: Application
          jsonPointers:
            - /metadata/ownerReferences
            - /status/reconciledAt
          jqPathExpressions:
            - .status.conditions[].lastTransitionTime
        - group: apps
          kind: ReplicaSet
          jsonPointers:
            - /metadata/ownerReferences
            - /metadata/annotations/deployment.kubernetes.io~1desired-replicas
            - /metadata/annotations/deployment.kubernetes.io~1max-replicas
            - /metadata/annotations/deployment.kubernetes.io~1revision
            - /status
            - /spec/replicas
        - kind: Deployment
          jsonPointers:
            - /spec/replicas

To Reproduce

Enable ServerSide Diff in 87 Apps which have reconciliation activity due to auto scaling and other events. Specify the ignoreDifferences as above for the Apps. Observe high reconciliation time for Applications and Diff on Apps showing. Apps usually often in Progressing state show OutOfSync on autoscaling related resources (Keda ScaledObject). In some cases, despite having ignoreDifferences on Deployment replicas, it shows OutOfSync on those replicas.

Expected behavior

Enabling ServerSide Diff should not cause OutOfSync in Apps compared to the normal diff with ServerSide Apply enabled. Reconciliation performance should not drastically deteriorate by enabling the feature and have a constant queue of pending kubectl runs).

Screenshots

Version

{
    "Version": "v2.10.9+c071af8",
    "BuildDate": "2024-04-30T15:53:28Z",
    "GitCommit": "c071af808170bfc39cbdf6b9be4d0212dd66db0c",
    "GitTreeState": "clean",
    "GoVersion": "go1.21.3",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v5.2.1 2023-10-19T20:13:51Z",
    "HelmVersion": "v3.14.3+gf03cc04",
    "KubectlVersion": "v0.26.11",
    "JsonnetVersion": "v0.20.0"
}

Logs

Paste any relevant application logs here.
agaudreault commented 3 months ago

Hey @Ezzahhh, ignoreDifferencesOnResourceUpdates: true and resource.ignoreResourceUpdatesEnabled: 'true' have no impact on the ignoreDifference features. Your issue is mixing the problems of slow performance and OutOfSync with server-side. It is not expected that using server-side would cause a severe decrease in performance. Do you mind opening 2 different issues, one for the server-side apply problem, and one for the performance of your argocd instance. This will also help have clear reproductible steps to try to investigate the issue.

Few resource on the performance: