argoproj-labs / argo-rollouts-manager

Kubernetes Operator for Argo Rollouts controller.
https://argo-rollouts-manager.readthedocs.io/en/latest/
Apache License 2.0
100 stars 329 forks source link

Better handling of resource reconciliation errors #2

Closed jaideepr97 closed 10 months ago

jaideepr97 commented 1 year ago

At present, the resource reconciliation logic is fairly straight forward in terms of its error handling. If we encounter any error whatsoever we return it immediately. However, this also has the effect of breaking the reconciliation cycle and blocking reconciliation of other resources that come later in the order. Not all resources created have equal importance and/or are worth blocking the reconciliation cycle for. We should consider which resources are critical for the functioning of the controller, and reserve returning errors only for situations where these resources produce some kind of failure

For instance, errors reconciling the metrics service should not block reconciliation of other resources

There could be further logic developed to fine tune how specific errors for specific resources should be handled (if not returned or ignored)

jgwest commented 10 months ago

Closing as this appears not have any actionable items at this time. We should keep this is mind once the operator reconciliation logic becomes more complex!