argoproj / gitops-engine

Democratizing GitOps
https://pkg.go.dev/github.com/argoproj/gitops-engine?tab=subdirectories
Apache License 2.0
1.67k stars 251 forks source link

fix(clusterCache): don't miss finding live obj if obj is cluster-scoped and namespacedResources is in transition #597

Closed ncdc closed 1 month ago

ncdc commented 2 months ago
  1. ~namespacedResources is accessed by multiple goroutines simultaneously and therefore must be guarded by a mutex. Because this field is read significantly more often than it's written, use an RWMutex.~ Removed this from this PR, as it might not be adequate in its current form to avoid deadlocks.

  2. When Reconcile performs its logic to compare the desired state (target objects) against the actual state (live objects), it looks up each live object based on a key comprised of data from the target object: API group, API kind, namespace, and name. While group, kind, and name will always be accurate, there is a chance that the value for namespace is not. If a cluster-scoped target object has a namespace (because it incorrectly has a namespace from its source) or the namespace parameter passed into the Reconcile method has a non-empty value (indicating a default value to use on namespace-scoped objects that don't have it set in the source), AND the resInfo ResourceInfoProvider has incomplete or missing API discovery data, the call to IsNamespacedOrUnknown will return true when the information is unknown. This leads to the key being incorrect - it will have a value for namespace when it shouldn't. As a result, indexing into liveObjByKey will fail. This failure results in the reconciliation containing incorrect data: there will be a nil entry appended to targetObjs when there shouldn't be.

Most likely addresses https://github.com/argoproj/argo-cd/issues/17440.

Most likely addresses at least 1 user who had cluster-scoped resources get wiped out in https://github.com/argoproj/argo-cd/issues/15058

rumstead commented 2 months ago

@ncdc awesome work! have you posted this on the Argo CD slack or plan to represent it at one of the community/contributor meetings?

rumstead commented 2 months ago

For https://github.com/argoproj/argo-cd/issues/15058, we have namespace on the Argo CD application because we deploy namespace-scoped resources as well so we definitely hit the condition.

ncdc commented 2 months ago

@rumstead yes, posted in

Not sure about my ability to attend a community meeting; will see.

ncdc commented 2 months ago

/cc @ash2k

rumstead commented 1 month ago

Not an approver but lgtm for what it is worth

crenshaw-dev commented 1 month ago

@ncdc before I merge, would you mind opening a PR on argo-cd pointing at the latest commit of this branch? That'll let us make sure all the tests still look good on the Argo CD side.

ncdc commented 1 month ago

It'll take me a little while (meetings) but I'll do it ASAP

ncdc commented 1 month ago

https://github.com/argoproj/argo-cd/pull/19040