crossplane / crossplane-runtime

A set of libraries for writing Crossplane controllers.
https://crossplane.io
Apache License 2.0
154 stars 100 forks source link

pkg/resource: applicator breaks optimistic concurrency and overwrites values #483

Open sttts opened 1 year ago

sttts commented 1 year ago

What happened?

Looking at the claim controller, updating the XR (cp in the code), the following race with other actors in the system (controllers, user with kubectl) exists:

  1. reconciler gets latest XR state cp
  2. reconciler updates in the cp whatever it wants to update
  3. reconciler calls Apply with cp

Assuming between 1 and 3a some client updates the XR out of band, e.g. some other controller, or kubectl. 3c will wipe out that change, won't it?

Above is just one example for that issue. The apply func is widely used in Crossplane.

How can we reproduce it?

This will likely show up as subtle bugs as this is some kind of time travel, or controllers fighting for their desired state of the world and racing with their informers.

What environment did it happen in?

Crossplane version: main

negz commented 1 year ago

Thanks for spotting this @sttts. At first glance I agreed with your assessment, but after digging a little deeper I think we're okay. Or more specifically, I don't think we're breaking optimistic concurrency anywhere that it matters much.

As described in https://github.com/crossplane/crossplane/issues/4047 we have two applicator implementations - APIUpdatingApplicator and APIPatchingApplicator. I just did a search to refresh my memory on all the places we use these.

APIUpdatingApplicator

https://github.com/search?q=org%3Acrossplane%20NewAPIUpdatingApplicator&type=code

The APIUpdatingApplicator definitely does break optimistic concurrency. This is dangerous, but I believe we only use it in places today where that's inconsequential. We use it to intentionally clobber other state, in cases where a resource Crossplane owns should always be synced from the state of another resource and where we don't expect other writers. In all cases we don't use a get-mutate-apply pattern - we only use it to build desired state (a "sparse object") and apply it.

I found the following uses of APIUpdatingApplicator:

In these cases I wouldn't expect other clients or controllers to be writing to the resources Crossplane applies, and if they did I think it's reasonable that Crossplane would undo their changes.

APIPatchingApplicator

https://github.com/search?q=org%3Acrossplane+NewAPIPatchingApplicator&type=code

The example you point to in the claim controller uses the APIPatchingApplicator. I think this applicator does actually respect the resource version of the applied resource though, per https://github.com/crossplane/crossplane-runtime/pull/258. We send a "merge patch" which is really just our entire serialized desired state as the HTTP PATCH request body to the API server. So if we did a get-mutate-apply, I believe our HTTP request would contain the metadata.resourceVersion that we read and be rejected by the API server if it was stale. So, I think we're safe from data-loss in this much more widely used applicator.

That said, it will lead to a lot of retries and noise (per https://github.com/crossplane/crossplane/issues/2472), and can't delete fields that we no longer want (per https://github.com/crossplane/crossplane/issues/4047). I'd still like to replace it with something better.

I found the following uses of APIPatchingApplicator:

github-actions[bot] commented 1 month ago

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.