azimuth-cloud / azimuth-caas-operator

K8s operator to create ansible based clusters using K8s CRDs
Apache License 2.0
1 stars 2 forks source link

Allow for a retry on all delete jobs #65

Closed JohnGarbutt closed 1 year ago

JohnGarbutt commented 1 year ago

On delete, there is no easy way for the user to retry, the admin has to go and delete job, after seeing a failed job alert. There is little cost in re-trying delete, in case it is some transient network error that might be fixed on a second try.

JohnGarbutt commented 1 year ago

it looks like this is borken finding the error: error: Failed to delete platform. Please contact Azimuth operators. phase: Failed updatedTimestamp: "2023-08-11T15:56:35Z" kind: List metadata: resourceVersion: "" selfLink: ""

JohnGarbutt commented 1 year ago

Ok, great, that fixed the problem with not finding the error.

mkjpryor commented 1 year ago

@JohnGarbutt

My personal opinion is that this should retry forever, with a backoff. As you said before, there is no way for a user to "re-trigger" the delete as Kubernetes already considers the resource as "marked for deletion", and IMHO requiring admin intervention in this case is rubbish.

mkjpryor commented 1 year ago

However what you have implemented here is easy and works for now, so LGTM