Experiment CRD - Githubissues

jessesuen commented 5 years ago

To drive canary analysis, we are proposing to introduce a mechanism to launch an ephemeral run of one or more ReplicaSets, typically finishing after some specified time duration. To power this, we would introduce an "Experiment" CRD, which might look something like:

apiVersion: argoproj.io/v1alpha1
kind: Experiment
name:
  name: guesbook-experiment
spec:
  durationSeconds: 600
  templates:
  - name: canary
    replicas: 1
    spec:
      containers:
      - name: guestbook
        image: guesbook:v2
  - name: baseline
    replicas: 1
    spec:
      containers:
      - name: guestbook
        image: guesbook:v1

This CRD would launch two replicasets, with the respective pod templates, for some time duration.

After the ReplicaSets completed the time duration, it would then be followed by an analysis which returns a score.

spec:
  durationSeconds: 600
  templates:
  - name: canary
...
  analysis:
    # syntax TBD????
    intervalSeconds: 60
    realtime: false
    prometheus:
      server: prometheus.default:9000
      query: grpc_server_handled_total{job="argocd-server-metrics",grpc_service="application.ApplicationService",grpc_code="Error"} > 0

Note that we would also need the ability to perform analysis of the Experiment in real-time, e.g. if the experiment is going horribly wrong and the score is below some threshold, the Experiment should be stopped prematurely and fail the rollout.

spec:
  durationSeconds: 600
  templates:
  - name: canary
...
  analysis:
    failureThreshold: 50
    failFast: true
status:
  score: 80

The way that this integrates with a Rollout, is by introducing a new step type into the canary strategy, which initiate the run of the experiment, and only proceed to the promotion, if the experiment was successful.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
name:
  name: guestbook
spec:
...
  strategy:
    canary:
      steps:
      - experiment: # syntax TBD
         templates:
         - name: baseline
           specFrom: stable
         - name: canary
           specFrom: canary
      - setWeight: 50
      - pause: {}

edlee2121 commented 5 years ago

We should give some thoughts about how this would integrate with various frontend services such as LaunchDarkly and backend services such as Kayenta. The most general form would be to create a class of CRD's to interface with these services. Less general but perhaps simpler to use would be to add, ideally, generic options for integrating with these services.

A well thought out integration strategy would allow us to better engage the community allow them to more easily participate.

mousavian commented 5 years ago

Great stuff, I'm interested to have this. Do you think it'll be in 0.5.0? :D

dthomson25 commented 5 years ago

@mousavian, the Experiments CRD will be in 0.5.0, but the analysis part of the experiments will be in a separate release. We haven't figured out more details on how the analysis will work yet, and it requires deeper exploration.

jessesuen commented 5 years ago

Analysis has been split out into a new CRD and is tracked here https://github.com/argoproj/argo-rollouts/issues/130.

A new CRD is needed because analysis can be invoked directly from a Rollout to drive progressive delivery as a separate use-case from experimentation.

argoproj / argo-rollouts

Experiment CRD #110