fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.89k stars 730 forks source link

Analysis isn't run on initial deploy resulting in insufficient information in canary status reason #676

Open cweidinger opened 4 years ago

cweidinger commented 4 years ago

Can you make the canary analysis run on the first deploy so that we can inspect the canary status to see whether the deployment is is fit for traffic?

Flagger Helm Chart Version: 1.0.0

On first deploy the canary status says

      message: 'New Deployment detected, starting initialization.'
      reason: Initializing
      status: Unknown
      type: Promoted

and then

      message: Deployment initialization completed.
      reason: Initialized
      status: 'True'
      type: Promoted

Which is true but doesn't give any information to the developer/operator if that version is fit to serve traffic.

We have the canary analysis running a pre-rollout system test suite for the app and a rollout metric analysis that does the standard request-success-rate metric. If it fails we notify the developer and don't deploy it to other environments. The problem is that this canary analysis isn't run on the first deployment.

If I restart the canary deployment with the same configuration/code manually (without changing any kubernetes objects), then the canary is run and may fail. In that case, the canary status would say

      message: 'New revision detected, progressing canary analysis.'
      reason: Progressing
      status: Unknown
      type: Promoted

and then

      message: 'Canary analysis failed, Deployment scaled to zero.'
      reason: Failed
      status: 'False'
      type: Promoted

which is great. That's what we look at to inform the developer of a botched deploy and the system to stop propagating the change to other regions and levels.

Ideally there would be a flag (like "runAnalysisOnInitialDeployment: true") we could set on the analysis to also run the analysis during the initial deployment so that we can inspect the canary analysis even on the first deploy.

There are several other really awful workarounds that we thought about (like having our gitops code talk to the kube api to restart the canary deployment once the initial deployment has finished) but I really think this functionality could benefit all flagger users and should be part of how it works. The idea of adding a flag is just for backwards compatibility. I would be okay if there was no flag and flagger just always ran the canary on the initial deploy.

Thanks for your work with flagger.

cweidinger commented 4 years ago

@stefanprodan Can you add this flag or change the default behavior to run the analysis on the initial deployment? The only work around we could think of is to write an operator that looked for canaries that had "Initialized" and restart the deployment to kick off a canary deployment but that felt heavy handed and wouldn't play nice with the other automation that we have that looks at the flagger_canary_status in prometheus.

danmassie commented 4 years ago

Being able to run the pre-rollout analysis on the initial deployment would be very useful. I'd like to be able to run a set of smoke tests on the canary using the helm test capability in the pre-rollout phase. Those should be run on the initial deploy and every subsequent one.

cweidinger commented 3 years ago

@stefanprodan Is this something you can do? If not, would you accept a Pull Request to this effect if we can find the time to add this feature?

cweidinger commented 3 years ago

@stefanprodan Is this something you can do? If not, would you accept a Pull Request to this effect if we can find the time to add this feature?

joey-jonko-paypal commented 3 months ago

This is still an open issue and concern. Why does flagger skip the analysis on first install/initialization?