argoproj / applicationset

The ApplicationSet controller manages multiple Argo CD Applications as a single ApplicationSet unit, supporting deployments to large numbers of clusters, deployments of large monorepos, and enabling secure Application self-service.
https://argocd-applicationset.readthedocs.io/
Apache License 2.0
586 stars 279 forks source link

Proposal: ApplicationSet Progressive Rollout #61

Open maruina opened 3 years ago

maruina commented 3 years ago

Hi all, I'm opening this issue because I'd like to discuss with the community a possible solution for what I think it's a common issue while doing GitOps.

Consider the following scenario, where you have multiple production clusters across multiple regions. When you use the ApplicationSet all the Application are updated at the same time and with the automated SyncPolicy you are basically doing a global rollout.

What I would like is to introduce the concept of a progressive rollout where you can decide how to rollout you application across those production clusters.

I can see two possible implementation of this idea:

  1. Extend the ApplicationSet specification to support the progressive rollout. The ApplicationSet controller will then be responsible for updating the desired Applications in the desired order. For example, you might want to update a region first, or updating 10% of all your clusters.
  2. Create a new controller with a new CRD dealing with the progressive rollout. The controller watches for ApplicationSet and does something like "argocd app sync " and argocd app wait <MYAPP>.

I think the second approach is much cleaner and doesn't overload the ApplicationSet controller, but I'd like to hear the community thoughts on this.

RichiCoder1 commented 3 years ago

I'm pretty interested in this, and there was actually some discussion in the slack about this: https://argoproj.slack.com/archives/C014ZPM32LU/p1600274721058300

Sounded like the conclusion was a new CRD, but one that possible was a core part of ArgoCD: https://github.com/argoproj/argo-cd/issues/1283

maruina commented 3 years ago

Thanks, I had a look at the slack conversation and at the ApplicationSync idea. I think it's a good one but I'm not sure how would that interact with the ApplicationSet.

I still think that a new controller makes sense, because it will allow you to iterate and move fast in the same way we're doing for the AppSet controller.

We also have quite different requirements in controlling the application rollout. For example:

This is why I think a separate controller might make more sense. The CRD could be something like

kind: ProgressiveRollout
metadata:
  name: myrollout
  namespace: argocd
spec:
  # A Kubernetes object representing the ApplicationSet
  # A change in this object will trigger a new progressive rollout
  applicationSetRef:
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    name: myappset
  rollout:
    # the type of the rollout strategy
    strategy: canary
    canary:
      # The ordered list of regions to use as canary
      regions:
        - eu-central-1
        - ap-southeast-1
      # (optional) The number of regions that can be updated in parallel. Default to 1
      parallelRegions: 1
      # (optional) The number of zones that can be updated in parallel. Default to 1
      parallelZones: 1
      # (optional) The number of clusters that can be updated in parallel. Default to 1
      parallelClusters: 1
      # (optional) The maximum number of cluster used as canary, per region. Default to 1.
      maxClusters: 1
      # (optional) The time to wait after a region is completed. Default to 0.
      bakeTimeRegion: 1h
      # (optional) The time to wait after a zone is completed. Default to 0.
      bakeTimeZone: 30m
      # (optional) The time to wait after a cluster is completed. Default to 0.
      bakeTimeCluster: 10m
    primary:
      regions:
        - ap-northeast-1
        - eu-west-1
        - eu-central-1
        - ap-southeast-1
      parallelRegions: 2
      parallelZones: 3
      parallelClusters: 1
      bakeTimeRegion: 2h
      bakeTimeZone: 1m
      bakeTimeCluster: 10m
  # (optional) 
  retries:
      # (optional) specifies the number of retries per cluster before marking the ProgressiveRollout failed. Default to 1
      # A value of -1 is infinite retries
      attempts: 3
      # (optional) retry interval. Default to 10m
      interval: 30m
  # (optional)
  metrics:
    - name: cluster-drained
      type: pre-deployment-cluster
      thresholdRange:
        min: 1
      interval: 5m
    # minimum req success rate (non 5xx responses)
    - name: region-request-success-rate
      type: post-deployment-region
      # percentage (0-100)
      thresholdRange:
        min: 99
    - name: myapp-custom-check
      type: post-bake-time-region
      templateRef:
        # Name of the MetricTemplate
        name: myapp-custom-check
        # (optional) The namespace where the metric check lives. Default to the operator namespace.
        namespace: mynamespace
      # accepted values
      thresholdRange:
        min: 10
        max: 1000
      # metric query time window
      interval: 5m
  # (optional)
  webhooks:
    - name: "regional load test"
      type: post-bake-time-region
      url: http://load-test-service.example.com
      timeout: 15s
      retries: 3
      metadata:
        cmd: "hey -z 1m -q 5 -c 2 http://myapp.example.com"
  # (optional)
  alerts:
    - name: "on-call Slack"
      severity: error
      providerRef:
        name: on-call-slack
        namespace: rollout-system
    - name: "info Slack"
      severity: info
      providerRef:
        name: info-slack

note that this mimic heavily Flagger, but it tries to extend it.

OmerKahani commented 3 years ago

@maruina currently ApplicationSet only handles the installation of the application CRD. The solution you are looking for is for the installation and the first sync stage? or for the day-to-day sync?

maruina commented 3 years ago

Day-to-day sync. Every time the AppSet creates/updates the Application(s), something should take care of their synchronization.

maruina commented 3 years ago

Hi all, I created a PoC for a progressive rollout controller using ApplicationSet. You can find it here https://github.com/maruina/argocd-progressive-rollout-controller

Any feedback is very much appreciated :)

ghostsquad commented 2 years ago

Has any progress been made on this front? I believe this to be a major blocker for adopting ApplicationSet.