fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.86k stars 726 forks source link

Deleting the canary resource with revertOnDeletion set to true still has some downtime #1562

Open aravindhbw opened 10 months ago

aravindhbw commented 10 months ago

Describe the bug

Re-opening an existing issue where deleting the canary resource with revertOnDeletion set to true results in downtime: https://github.com/fluxcd/flagger/issues/1550 There has been some improvement but we are still observing some downtime since the wait time is not sufficient for the pods to be in READY state.

To Reproduce

Same as before

Expected behavior

Flagger should wait for the old deployment pods to be up and running in READY state before the canary deployment is deleted.

Additional context

aravindhbw commented 9 months ago

FYI, this was worked around by scaling up the canary pods manually by setting the number of replicas to be equal to the current primary pods before deletion.

rye-sw commented 6 months ago

Hey did you try upgrade it to 1.35? This change should fix the issue and it was released in v1.35

filip-zyzniewski commented 5 months ago

We are experiencing this with flagger v1.37.

The log looks like this:

{"level":"info","ts":"2024-04-19T15:45:32.452Z","caller":"controller/events.go:33","msg":"Terminating canary someapp-foo.uat-product","canary":"someapp-foo.uat-product"}
{"level":"info","ts":"2024-04-19T15:45:32.477Z","caller":"controller/finalizer.go:73","msg":"someapp-foo.uat-product kind Deployment reverted"}
{"level":"info","ts":"2024-04-19T15:45:32.477Z","caller":"controller/finalizer.go:76","msg":"Checking if canary is ready someapp-foo.uat-product"}
{"level":"info","ts":"2024-04-19T15:45:32.480Z","caller":"controller/events.go:33","msg":"Terminating canary someapp-foo.uat-product","canary":"someapp-foo.uat-product"}
{"level":"error","ts":"2024-04-19T15:45:32.518Z","caller":"controller/controller.go:271","msg":"Unable to finalize canary: failed to revert target: scale failed: scaling someapp-foo.uat-product to 2 failed: Operation cannot be fulfilled on deployments.apps \"someapp-foo\": the object has been modified; please apply your changes to the latest version and try again","canary":"someapp-foo.uat-product","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).syncHandler\n\t/workspace/pkg/controller/controller.go:271\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem.func1\n\t/workspace/pkg/controller/controller.go:227\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem\n\t/workspace/pkg/controller/controller.go:234\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).Run.func1\n\t/workspace/pkg/controller/controller.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:161"}
{"level":"error","ts":"2024-04-19T15:45:32.518Z","caller":"controller/controller.go:237","msg":"error syncing 'uat-product/someapp-foo': unable to finalize to canary someapp-foo.uat-product error: failed to revert target: scale failed: scaling someapp-foo.uat-product to 2 failed: Operation cannot be fulfilled on deployments.apps \"someapp-foo\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem\n\t/workspace/pkg/controller/controller.go:237\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).Run.func1\n\t/workspace/pkg/controller/controller.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:161"}
{"level":"info","ts":"2024-04-19T15:45:32.553Z","caller":"controller/finalizer.go:73","msg":"someapp-foo.uat-product kind Deployment reverted"}
{"level":"info","ts":"2024-04-19T15:45:32.553Z","caller":"controller/finalizer.go:76","msg":"Checking if canary is ready someapp-foo.uat-product"}
{"level":"info","ts":"2024-04-19T15:45:41.512Z","caller":"router/kubernetes_default.go:233","msg":"Service someapp-foo updated","canary":"someapp-foo.uat-product"}
{"level":"info","ts":"2024-04-19T15:45:41.512Z","caller":"controller/finalizer.go:106","msg":"someapp-foo.uat-product router reverted"}
{"level":"info","ts":"2024-04-19T15:45:41.512Z","caller":"controller/finalizer.go:130","msg":"someapp-foo.uat-product mesh provider gatewayapi:v1beta1 reverted"}
{"level":"info","ts":"2024-04-19T15:45:41.512Z","caller":"controller/finalizer.go:113","msg":"Finalization complete for someapp-foo.uat-product"}
{"level":"info","ts":"2024-04-19T15:45:41.564Z","caller":"controller/controller.go:172","msg":"Deleting someapp-foo.uat-product from cache"}
{"level":"info","ts":"2024-04-19T15:45:41.565Z","caller":"controller/controller.go:172","msg":"Deleting someapp-foo.uat-product from cache"}
{"level":"info","ts":"2024-04-19T15:45:41.566Z","caller":"controller/events.go:33","msg":"Terminated canary someapp-foo.uat-product","canary":"someapp-foo.uat-product"}
{"level":"info","ts":"2024-04-19T15:45:41.566Z","caller":"controller/controller.go:286","msg":"Canary someapp-foo.uat-product has been successfully processed and marked for deletion"}
{"level":"error","ts":"2024-04-19T15:45:41.566Z","caller":"controller/controller.go:252","msg":"uat-product/someapp-foo in work queue no longer exists","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).syncHandler\n\t/workspace/pkg/controller/controller.go:252\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem.func1\n\t/workspace/pkg/controller/controller.go:227\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem\n\t/workspace/pkg/controller/controller.go:234\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).Run.func1\n\t/workspace/pkg/controller/controller.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:161"}
{"level":"info","ts":"2024-04-19T15:45:41.598Z","caller":"controller/finalizer.go:106","msg":"someapp-foo.uat-product router reverted"}
{"level":"info","ts":"2024-04-19T15:45:41.598Z","caller":"controller/finalizer.go:130","msg":"someapp-foo.uat-product mesh provider gatewayapi:v1beta1 reverted"}
{"level":"info","ts":"2024-04-19T15:45:41.598Z","caller":"controller/finalizer.go:113","msg":"Finalization complete for someapp-foo.uat-product"}
{"level":"error","ts":"2024-04-19T15:45:41.619Z","caller":"controller/controller.go:279","msg":"Unable to remove finalizer for canary someapp-foo.uat-product error: failed after retries: canary someapp-foo.uat-product get query failed: canaries.flagger.app \"someapp-foo\" not found","canary":"someapp-foo.uat-product","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).syncHandler\n\t/workspace/pkg/controller/controller.go:279\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem.func1\n\t/workspace/pkg/controller/controller.go:227\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem\n\t/workspace/pkg/controller/controller.go:234\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).Run.func1\n\t/workspace/pkg/controller/controller.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:161"}
{"level":"error","ts":"2024-04-19T15:45:41.619Z","caller":"controller/controller.go:237","msg":"error syncing 'uat-product/someapp-foo': unable to remove finalizer for canary someapp-foo.uat-product: failed after retries: canary someapp-foo.uat-product get query failed: canaries.flagger.app \"someapp-foo\" not found","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem\n\t/workspace/pkg/controller/controller.go:237\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).Run.func1\n\t/workspace/pkg/controller/controller.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:161"}
{"level":"error","ts":"2024-04-19T15:45:41.619Z","caller":"controller/controller.go:252","msg":"uat-product/someapp-foo in work queue no longer exists","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).syncHandler\n\t/workspace/pkg/controller/controller.go:252\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem.func1\n\t/workspace/pkg/controller/controller.go:227\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).processNextWorkItem\n\t/workspace/pkg/controller/controller.go:234\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).Run.func1\n\t/workspace/pkg/controller/controller.go:190\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:204\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.27.11/pkg/util/wait/backoff.go:161"}

Before deleting this Canary, flagger was able to peform both a successful and a failing (due to analysis) deployment. There is no other actor manipulating these resources (even the flux kustomization is suspended).

Our kubernetes version is v1.29.1-eks-b9c9ed7.