Open embano1 opened 1 year ago
@jaypipes @A-Hilaly putting this out for discussion
Guidelines writing Kubernetes controllers: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/controllers.md
@embano1 note that the above doc does not refer to ObservedGeneration
in a Condition
but rather in the Status
struct iself.
I still have concerns about an ObservedGeneration
field in our Condition
structs, for the same reasons I gave in the comment on #1133. Adding it would mean we could never move to a multi-writer architecture and could only ever use a single-writer architecture (i.e. the leader-elected architecture that upstream Kubernetes controllers all have).
Also, all Conditions on all ACK resources are entirely cleared for every resource at the beginning of reconciliation:
The reason for this is because a Condition
represents a state transition during that particular reconcile. So, all Condition
s are guaranteed to always be the freshest representation of a resource's latest state transition. I guess I could see some use in debugging crashing controllers by adding an ObservedGeneration
to a Condition
, but frankly, that generation is output in all log lines in the controller's logs anyway, along with a timestamp, so it's relatively straightforward to diagnose state transition issues simply by looking at the log.
That said, if this is something a bunch of folks want (@A-Hilaly @RedbackThomson @jljaco) then I will back off and relent :) It should not be difficult to add support for this into our standard Condition handling.
@embano1 note that the above doc does not refer to
ObservedGeneration
in aCondition
but rather in theStatus
struct iself.
Reason I mentioned this doc (in addition to the more specific links in the original post) was to point out that setting ObservedGeneration
is a recommended pattern (and has evolved over time, see other links).
Adding it would mean we could never move to a multi-writer architecture and could only ever use a single-writer architecture (i.e. the leader-elected architecture that upstream Kubernetes controllers all have).
I don't think so. See what Daniel Smith responded (quoting from above):
Unfortunately we don't have a standard way for multiple controllers to work together on status other than conditions (Daniel Smith SIG API Machinery)
IMHO setting status conditions does not prevent a controller authors from multi-writer scenarios. Leader-election is a special case and not truly multi-writer due to the fencing mechanisms at play (only one writer at all times) unless the controller is violating leader election primitives (i.e. bug).
In fact, I'd argue that status conditions actually provide the means to have multiple controllers cooperate (in addition to the advancements with server-side apply).
Also, all Conditions on all ACK resources are entirely cleared for every resource at the beginning of reconciliation:
The reason for this is because a
Condition
represents a state transition during that particular reconcile. So, allCondition
s are guaranteed to always be the freshest representation of a resource's latest state transition. I guess I could see some use in debugging crashing controllers by adding anObservedGeneration
to aCondition
, but frankly, that generation is output in all log lines in the controller's logs anyway, along with a timestamp, so it's relatively straightforward to diagnose state transition issues simply by looking at the log.
I don't see any issue with the existing "clearing" logic: that is exactly the way status is designed for and inline with the API conventions, i.e. being stateless (though we've seen users bending the rules here).
Setting ObservedGeneration
is also a "stateless" operation and does not violate this principle (otherwise all controllers would be broken).
Regarding debugging: instead of looking at the various controller logs, an operator/developer can clearly see whether the controller is actively reconciling by inspecting the Kubernetes resource, e.g. kubectl get ...
. Note that if central logging is configured (which is very common), this is even more beneficial by tracking object state transitions and controller logs in a central logging system (e.g. ElasticSearch or CloudWatch) and using timestamps to reason about issues.
I just checked the code as I'm happy to implement this. But it's pretty deep in the runtime
and there's likely edge cases that I am not thinking about all the code paths to touch to add ObservedGeneration
to all status fields for all Synced/Terminal cases (not sure if controller authors can even influence/overwrite this behavior on a case by case basis...).
Regarding debugging: instead of looking at the various controller logs, an operator/developer can clearly see whether the controller is actively reconciling by inspecting the Kubernetes resource
I would echo this point. In my case I have a different controller provisioning an ACK Bucket
. It needs to first create or update the Bucket
and once that is complete, do further processing. Without some way to know that syncing is underway or complete, it is forced try and save when the last sync was and check that the bucket has a ResourceSynced
transition time that is newer than that. This gets particularly painful when there are scenarios where my controller will need to reconcile loop. Or when it is not clear whether there is an actual diff on the update for the Bucket
, so it doesn't know if it needs to wait.
I don't know if ObservedGeneration
is the right solution, but some way to know that a sync is underway would be extremely useful.
Also, all Conditions on all ACK resources are entirely cleared for every resource at the beginning of reconciliation
Unless I am missing something, this does not appear to be true. If I update a Bucket
there is no change until it has synced and there is a new transition time for ResourceSynced
. There is no update where the conditions are cleared. (Using kubectl get bucket --watch -o yaml
)
Like I mentioned, if this is something we all want, I'm happy to walk a contributor through the process of adding this functionality to the ACK runtime. :)
/remove-lifecycle stale
@jaypipes
Adding it would mean we could never move to a multi-writer architecture and could only ever use a single-writer architecture (i.e. the leader-elected architecture that upstream Kubernetes controllers all have).
Are you referring to a future architecture where the whole status object would be reset and rewritten by multiple writers, or where various fields are 'owned' by various controllers? In the former scenario, if the whole status is rewritten a single observedGeneration on the status would suffice since you wouldn't be able to partially read the previous status. In the latter scenario, this is where having a different condition per controller would come into play, with each of them having their own observedGeneration field.
We're looking for this functionality as well to be able to guarantee the resource is indeed sync'd when we are looking at it from another non-ACK controller, and not to cause race conditions where we update the resource and read it again before the controller has time to observe and react to the change.
Regarding debugging: instead of looking at the various controller logs, an operator/developer can clearly see whether the controller is actively reconciling by inspecting the Kubernetes resource
I would echo this point. In my case I have a different controller provisioning an ACK
Bucket
. It needs to first create or update theBucket
and once that is complete, do further processing. Without some way to know that syncing is underway or complete, it is forced try and save when the last sync was and check that the bucket has aResourceSynced
transition time that is newer than that. This gets particularly painful when there are scenarios where my controller will need to reconcile loop. Or when it is not clear whether there is an actual diff on the update for theBucket
, so it doesn't know if it needs to wait.
Hi @AlistairBurrowes sorry for the late response. The way to determine if an ACK resource is completely synced is to check whether there is a Condition of type ACK.ResourceSynced
with a ConditionStatus of True
. What this means is that the last time the reconciler ran for that resource, the controller was able to determine that the desired state of the resource matched the latest observed state of the resource.
Whenever there is a Condition of type ACK.ResourceSynced
with a ConditionStatus of False
that means the last time the reconciler ran for that resource, the controller was able to determine that the desired state of the resource did NOT match the latest observed state of the resource.
If there is a Condition of type ACK.Terminal
with a ConditionStatus of True
on such a resource, that means the resource will never have a Condition of type ACK.ResourceSynced
with a ConditionStatus of True
unless the user modifies something about the resource's Spec
.
I don't know if
ObservedGeneration
is the right solution, but some way to know that a sync is underway would be extremely useful.Also, all Conditions on all ACK resources are entirely cleared for every resource at the beginning of reconciliation
Unless I am missing something, this does not appear to be true. If I update a
Bucket
there is no change until it has synced and there is a new transition time forResourceSynced
. There is no update where the conditions are cleared. (Usingkubectl get bucket --watch -o yaml
)
Sorry, this is referring to the ACK runtime's behaviour of deleting all Condition objects in a CR's Status
at the start of each reconciliation loop. We don't actually Patch that condition-resetting change back to the Kubernetes API server but rather recalculate the CR's Conditions at the start of each reconcile loop.
Is your feature request related to a problem?
Identifying whether a controller is working correctly is currently not easy (neither 100% accurate) when looking at the
ResourceSynced
status field:1) Patterns: For people coming with some Kubernetes operational knowledge, not having
ObservedGeneration
could impact troubleshooting and documented procedures best practices gained from using other controllers/operators. 1) Failures: If the controller is not running/keeps crashing after a successful initialResourceSynced
the operator cannot easily see whether the resource actually has synced based on the setGeneration
in themetadata
- becauseResourceSynced
might be stale.In https://github.com/aws-controllers-k8s/community/issues/1133#issuecomment-1020218494 reasons were given why
ObservedGeneration
might have issues, e.g. when dealing with multiple controllers. IMHO this is not accurate and against the Kubernetes convention of usingObservedGeneration
in Kubernetes resources:Also, currently I don't see any multi-writer implementation in ACK or related projects using ACK (to my knowledge). But even if that's a scenario (very likely in the future), Server-Side Apply (SSA) and related techniques in the API server are meant to address multi-writer scenarios, also for updating individual conditions:
Describe the solution you'd like Follow Kubernetes
standardized conditions
convention.Describe alternatives you've considered Keep as is.