fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.8k stars 720 forks source link

[Question] restart analysis behaviour during pre-rollout #550

Open radiansatria opened 4 years ago

radiansatria commented 4 years ago

Hello friends, i want to ask something regarding pre-rollout webhook. So i tried to restart the analysis process (by deploying a new version) during B/G deployment when it's on pre-rollout stage, since the pre-rollout webhook is blocking (as per this ) the analyisis will only be restarted after the pre-rollout stage is finished. Problem is i am using pre-rollout webhook to run some test that targets the canary version, if a new version is deployed when pre-rollout is still running this will caused problem since the test is supposed to target the previous canary version not the new deployed canary version. Is this behaviour intended? if not, is there a workaround for this scenario?

Flagger version: 0.20.4 Flagger apiversion: flagger.app/v1alpha3 Service Mesh: Istio (1.4.0) K8s Version: v1.13.12 Logs: no error. Just weird behaviour

radiansatria commented 4 years ago

Hi @stefanprodan , have you got the time to check this? i also tried using the rollout webhook but it's also blocking and flagger is waiting until my test is finished to restart the B/G deployment. Here is the analysis spec, just in case you want to repro this:

spec:
  canaryAnalysis:
    interval: 1m
    iterations: 1
    threshold: 1
    webhooks:
    - name: "Functional Test"
      type: pre-rollout
      url: http://flagger-loadtester.test/
      timeout: 15m
      metadata:
        type: bash
        cmd: "my-test.sh "

you can actually replace the my-test.sh command with sleep 5m or anything that simulates a test. With that spec you will notice that in the middle of pre-rollout stage if someone or something deploy a new version, it wont restart the analysis right away. Instead it will wait until the pre-rollout stage is finished. Don't you think this will cause erroneous analysis? is there a way to force restart the analysis as in canary deployment scenario?

stefanprodan commented 4 years ago

If you change the type to cmd it will run async but then the tester service will not report the exit code of your script. To make this work, a bash async runner could be implemented similar to:

radiansatria commented 4 years ago

@stefanprodan I see. So i need to create new taskType in loadtester. Thanks for this. But looking at this code here concord is a blocking task. So i am still not sure, how would Flagger handle new deployment in the middle of concord test? and by looking at concord task metadata, it's targeting the canary version thus new deployment will render the test invalid right?