argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.67k stars 839 forks source link

Add setCanaryScale feature in Simple Canary Deployment #2865

Open rajeshetty87 opened 1 year ago

rajeshetty87 commented 1 year ago

Summary

Provide support for setCanaryScale feature in Simple Canary Deployments. Currently this is only available in Canary w/ Traffic Routing. This is required for JVM based services that need an initial cache creation step to be performed by a single pod in the replica set.

Use Cases

Currently Blue Green deployment stragey allow us to create a single pod in the previewReplicaSet using the parameter previewReplicaCount. This enables the new Revision on previewService to scale the spec.Replicas without receiving any traffic. Our JVM based services need this feature to bring one instance of the new revision that can perform tasks like initial cache creation. The remaining pods in the same replicaSet utilize this cache which helps with a quick startup and lowers the load on database.

When moving form Blue Green to Basic Canary we tried to replicate this feature by using the setWeight feature and adding the initial weight to 1% ex.

  strategy:
    canary:
      maxSurge: 25%
      steps: 
        - setWeight: 1
        - pause:
            duration: 3m
        - setWeight: 10
        - pause:
            duration: 1m
        - setWeight: 25
        - pause:
            duration: 1m
        - setWeight: 50
        - pause:
            duration: 1m
        - setWeight: 75
        - pause:
            duration: 1m

This works really well when you have <=100 pods which is not the case for Production workloads. Thus we end up having >1 pod trying to perform the expensive cache creation activity thus overloading database.

Questions.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 60 days with no activity.

kostis-codefresh commented 1 month ago

How do you solve this problem right now (before adopting Argo Rollouts)?