crossplane / crossplane-runtime

A set of libraries for writing Crossplane controllers.
https://crossplane.io
Apache License 2.0
158 stars 101 forks source link

Deprecate the 'Synced' status condition #198

Closed negz closed 4 days ago

negz commented 4 years ago

What problem are you facing?

Most Crossplane resources (managed, claims, composites, etc) typically expose two types of status condition:

Sometimes Ready goes by different names - e.g. Established for XRDs.

Originally Crossplane only had a Ready condition, but it was conceptually conflated with "synced" and attempted to reflect both whether the underlying resource was ready and whether Crossplane was successfully able to reconcile the underlying resource. The Synced condition has served us well for some time, but I think it should be removed because:

The apiextensions 'definition' and 'offered' controllers are examples of the latter issue. Both controllers collaborate to reconcile a CompositeResourceDefinition. The definition controller manages the lifecycle of the defined composite resource and its controller, while the offered controller manages the lifecycle of the offered resource claim (if any) and its controller. Currently both reconcilers attempt to set the Synced condition of the XRD; if one controller was reconciling successfully and the other was encountering an error they would get stuck in a tight reconcile loop.

How could Crossplane help solve your problem?

I suggest we deprecate and stop using the Synced condition in any reconciler that has been updated to emit events and logs.

muvaf commented 4 years ago

I believe conditions are most useful in machine to machine communication, which is not really what events/logs are for. However, Synced doesn't really mean much for other controllers or systems, at least I haven't seen us writing code to check Synced condition, it's mostly the Ready one. So, in the places we're able to communicate that there is a misconfiguration problem to the user via another mechanism like event, I agree that we don't need Synced condition.

muvaf commented 4 years ago

Hmm, actually UI systems are usually built using YAML output of the resources, which do not have the events. So that's a machine usage of the Synced condition; to show user that there is a misconfiguration through the UI. I believe Upbound Cloud is built this way. I went to look at Pod to see what conditions it has:

  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-09-03T15:12:15Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-09-10T07:54:42Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-09-10T07:54:42Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-09-03T15:12:15Z"
    status: "True"
    type: PodScheduled

From what I can tell, there is an abundance of conditions which also have corresponding events published. So, it comes down to whether we'd like the YAML, i.e. etcd record of the resource, to fully represent everything we know about it at the time of printing or not. I generally lean towards abundance of information as long as it's easy to keep it all in sync. Maybe merge errors that we report under Synced into Ready condition? i.e. look at Ready condition for all kinds of errors we are able to report?

negz commented 4 years ago

Maybe merge errors that we report under Synced into Ready condition? i.e. look at Ready condition for all kinds of errors we are able to report?

This sounds like how Crossplane originally used conditions, which we found to be overloaded and confusing. A controller encountering an error during reconciliation doesn't necessarily mean that the external resource is not ready; they're two different things.

I believe conditions are most useful in machine to machine communication, which is not really what events/logs are for.

I believe machine to machine communication is in scope for events. They're actually a resource like any other, e.g.:

# $ kubectl get event -o yaml compositepostgresqlinstances.database.example.org.16338ed5a523dfe0
apiVersion: v1
count: 2487
eventTime: null
firstTimestamp: "2020-09-10T23:09:00Z"
involvedObject:
  apiVersion: apiextensions.crossplane.io/v1alpha1
  kind: CompositeResourceDefinition
  name: compositepostgresqlinstances.database.example.org
  resourceVersion: "3241413"
  uid: e29f6357-fe42-4a4b-9f9b-781e0efc1d96
kind: Event
lastTimestamp: "2020-09-11T01:14:02Z"
message: Rendered composite resource claim CustomResourceDefinition
metadata:
  creationTimestamp: "2020-09-10T23:09:00Z"
  name: compositepostgresqlinstances.database.example.org.16338ed5a523dfe0
  namespace: default
  resourceVersion: "3371564"
  selfLink: /api/v1/namespaces/default/events/compositepostgresqlinstances.database.example.org.16338ed5a523dfe0
  uid: 2f76e628-d381-41a3-a0e8-9132504fd16b
reason: RenderCRD
reportingComponent: ""
reportingInstance: ""
source:
  component: offered/compositeresourcedefinition.apiextensions.crossplane.io
type: Normal

The events API supports filtering using field selectors on the involvedObject API, and the event API object is like a more detailed status condition. To me this means the functionality is basically a superset of conditions, because you can also see historical transitions rather than just the current state.

negz commented 4 years ago

From @hasheddan on https://github.com/crossplane/crossplane/pull/1721:

I wonder if we could just confine synced to resources that talk to APIs outside of the cluster?

I think it's appropriate to remove Synced everywhere; even for external resources. I think they're still redundant as long as we emit an event whenever we would have set the Synced condition.

displague commented 3 years ago

I'm curious where this stands. I encountered debate on status Conditions (https://github.com/kubernetes/community/pull/4521) which is ultimately reflected in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties.

I was spelunking relevant Crossplane issues and found this issue (198) and a 2019 comment https://github.com/crossplane/crossplane/issues/495#issuecomment-495319702 where Synced was first proposed and defended.

I think they're still redundant as long as we emit an event whenever we would have set the Synced condition.

Does the 2019 comment in support of Synced offer any other guidance (if only for reasons in support of removal)?

negz commented 3 years ago

@displague I'd still like to deprecate it but I haven't yet had time to find out whether folks are onboard - I could imagine it might be a bit controversial. In my mind Ready is fairly idiomatic - it represents the state/'condition' of the thing under management. Synced really just means "did we or did we not hit and error on the most recent reconcile". To me it seems like events (and logs) are the more idiomatic place for that kind of issue.

MisterMX commented 2 years ago

I think those conditions are quite useful. Especially in combination with additionalPrinterColumns because they allow you to quickly distinguish between reconcile errors and external resources that are not ready yet without having to kubectl describe into the resource.

If setting these conditions does not cause any issues I would prefer to keep them. In addition to that I would propose to add the synced condition/column to XRCs and XRs as well: https://github.com/crossplane/crossplane/issues/2850

negz commented 2 years ago

https://kpt.dev/reference/schema/crd-status-convention/

More prior art - looks like the kpt folks recommend something similar.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bobh66 commented 3 months ago

My .02 - I like having SYNCED because it tells me whether I need to fix something to get the resource to reconcile properly and it's easy to see in kubectl get output. If all I have is READY=False and I have to pull logs or events to find out why it would be a lot more effort. The combination of SYNCED and READY makes it easy to know where to focus my attention when something isn't working properly.

github-actions[bot] commented 2 weeks ago

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.