Closed chrisguidry closed 2 months ago
Tested by deploying per the readme and watching the conditions on each CRD:
$ kubectl get prefectserver prefect-sqlite -o jsonpath='{.status.conditions}'| jq
[
{
"lastTransitionTime": "2024-08-20T21:42:12Z",
"message": "PersistentVolumeClaim was created",
"reason": "PersistentVolumeClaimCreated",
"status": "True",
"type": "PersistentVolumeClaimReconciled"
},
{
"lastTransitionTime": "2024-08-20T21:42:12Z",
"message": "MigrationJob is not required",
"reason": "MigrationJobNotRequired",
"status": "True",
"type": "MigrationJobReconciled"
},
{
"lastTransitionTime": "2024-08-20T21:42:12Z",
"message": "Deployment was created",
"reason": "DeploymentCreated",
"status": "True",
"type": "DeploymentReconciled"
},
{
"lastTransitionTime": "2024-08-20T21:42:12Z",
"message": "Service was created",
"reason": "ServiceCreated",
"status": "True",
"type": "ServiceReconciled"
}
]
Able to access server (by port-forwarding the Service):
These Pods are failing to come up:
httpcore.ConnectError: [Errno -2] Name or service not known
However the status condition on the CR makes it appear healthy:
$ kubectl get prefectworkpool sample-pool -o jsonpath='{.status.conditions}'| jq
[
{
"lastTransitionTime": "2024-08-20T21:42:12Z",
"message": "Deployment is in the correct state",
"reason": "DeploymentUpdated",
"status": "True",
"type": "DeploymentReconciled"
}
]
$ kubectl get prefectworkpool sample-pool
NAME TYPE VERSION DESIRED WORKERS READY WORKERS
sample-pool process 3.0.0rc18 3 0
We may just eventually add a status condition reporting that the desired number of workers has not yet been satisfied, but that'll be out of the scope of this PR.
Yeah it's still way more verbose than I'd like. Using SetStatusCondition
takes care of a lot of it, but it's still just a verbose thing (with type, status, reason, and message every time). One thing that might make it cleaner is to use a function like you saw in cert-manager, but maybe one for each status (true or false):
func observeCondition(ctx context.Context, workspace *cloudv1.Workspace, type, reason, message string) error {
...
}
func observeAbsence(ctx context.Context, workspace *cloudv1.Workspace, type, reason, message string) error {
...
}
so then when you observeCondition
, it's status "True", and when you observeAbsence
it's "False"? That might help with some of the eyebleed.
Also, maybe if we get errors setting the status, we just log those failures instead of returning them back and stopping the process? That makes them a little more "fire and forget". I don't know what kinds of errors would even happen there besides a total failure to talk to the API
I'll take a stab at that in a future PR
One thing that might make it cleaner is to use a function like you saw in cert-manager, but maybe one for each status (true or false):
Sounds like a good improvement so they're ~ 1 liners instead 👍🏼
Also, maybe if we get errors setting the status, we just log those failures instead of returning them back and stopping the process? That makes them a little more "fire and forget". I don't know what kinds of errors would even happen there besides a total failure to talk to the API
Fair question, looking through some examples I see a pattern of defining some updateStatus
function and then defer
-ing it (example from Cert Manager). So the status and condition variables are defined early with empty or default values, then the values are updated throughout the method based on activities, and then defer
ensures that the status is updated before that method exits. Might be a good way to reduce the API calls, repetition, and if/else instances 🤔
Oh that's amazing, I forgot about defer
in Go, it's one of the raddest things they've done
Before, all of the final else blocks of reconciling were updating the condition to say effectively "still all good here!", which was overwriting whatever the prior iterations had updated even though there was no materially new information.
In this change, I'm putting guards around those final updates so that we only update them if