Open krancour opened 1 week ago
The way I have dealt with this in the past is using conditions, which are kind of the successors of "phases" in Kubernetes API designs based on:
Some resources in the v1 API contain fields called phase, and associated message, reason, and other status fields. The pattern of using phase is deprecated. — https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md
A Reconciling
condition would then be added when the Kubernetes object is "in motion", and removed once it reaches a stable state.
This makes me wonder if it would not be more practical for Kargo to move to such a pattern, as it allows you to describe one (or more) indication(s) about the resource state.
@hiddeco I'd be open to that if we do it across the board for consistency. If we were to do so, how do you propose we distill the conditions down to something suitably phase-like for easy human consumption?
When I do kargo get <resource type>
, I want to be easily able to easily discern the current state from the information in the returned columns.
Taking Warehouse
as an example, this could look something like the following using the metav1.Condition
type. Which can easily be disabled as columns using e.g.
// +kubebuilder:printcolumn:name="Healthy",type="string",JSONPath=".status.conditions[?(@.type==\"Healthy\")].status",description=""
// +kubebuilder:printcolumn:name="Ready",type="string",JSONPath=".status.conditions[?(@.type==\"Ready\")].status",description=""
status:
conditions:
- type: Reconciling
status: True
reason: ScheduledDiscovery
message: "Currently discovering artifacts for 3 sources."
lastTransitionTime: "2024-07-01T12:00:00Z"
- type: Ready
status: False
reason: ScheduledDiscovery
message: "Currently discovering artifacts for 3 sources."
lastTransitionTime: "2024-07-01T12:00:00Z"
- type: Healthy
status: True
reason: AllSourcesAccessible
message: "All 3 sources are accessible."
lastTransitionTime: "2024-07-01T12:00:00Z"
status:
conditions:
- type: Ready
status: True
reason: ArtifactsDiscovered
message: "Discovered 86 artifacts from 3 sources."
lastTransitionTime: "2024-07-01T12:00:00Z"
- type: Healthy
status: True
reason: AllSourcesAccessible
message: "All 3 sources are accessible."
lastTransitionTime: "2024-07-01T12:00:00Z"
status:
conditions:
- type: Reconciling
status: True
reason: ScheduledDiscovery
message: "Currently discovering artifacts for 3 sources."
lastTransitionTime: "2024-07-01T12:00:00Z"
- type: Ready
status: False
reason: SourceInaccessible
message: "One or more sources are inaccessible."
lastTransitionTime: "2024-07-01T12:00:00Z"
- type: Healthy
status: False
reason: SourceInaccessible
message: "One or more sources are inaccessible."
lastTransitionTime: "2024-07-01T12:00:00Z"
For other Kargo API kinds, we would have to translate the existing phases to condition types like the above. The goal here would be to share certain condition types as much as we can across kinds (e.g. Ready, Healthy, and Reconciling) to describe the "high level" state, which eases consumption over having to learn the specific condition types for each kind.
However, specific ("lower level") condition types would still be allowed to exist, and can often be summarized, composed and/or merged into these "higher level" types.
WarehouseStatus
has aMessage
field that describes any errors encountered in the course of a reconciliation, but this alone might not be adequate for surfacing what's currently going on with a Warehouse. I have to dig a little too deep to get at this information.When I run
kubectl get warehouses
orkargo get warehouses
I'd like to see new columns that show with a glance:Did the last reconciliation succeed or fail?
Is it currently discovering artifacts?
Ever since we move to discovering n recent artifacts instead of only selecting the latest applicable one, a single reconciliation loop can take a bit longer than it used to -- possibly even a couple minutes depending on what's going on in the rest of the system since we rate limit connections on a per-registry basis.
I want to know if my Warehouse is idling or currently doing something.
I propose two possible health states -- Healthy and Unhealthy. If Unhealthy, the message field will contain an error. (That part already works.)
I propose two possible phases -- Discovering and Idling.
This would require very little effort and I believe it would improve the UX significantly for CLI / kubectl users.
@jessesuen @hiddeco any thoughts on this?