argoproj-labs / rollouts-plugin-trafficrouter-openshift

Apache License 2.0
1 stars 4 forks source link

Route created with non-zero canary weight does not have canary weight set back to 0 on creation of Rollout #11

Closed jgwest closed 2 months ago

jgwest commented 5 months ago

When a new Route is created with non-zero canary, and then a new Rollout is created targeting that Route, I would expect that the Route would set the canary weight to 0 on the Route (under .spec.alternateBackends).

Instead, it appears that when the Rollout is initially created, it does not touch the Route's canary weight: after the Rollout is created, the Route still has non-zero canary weight.

These instructions are roughly based on this.

Steps to reproduce:

1) Login in to an OpenShift cluster, that has the latest Argo Rollouts operator running on that cluster in cluster-scoped mode.

kubectl create ns argo-rollouts

2) Create the following resources:

apiVersion: argoproj.io/v1beta1
kind: RolloutManager
metadata:
  name: argo-rollout
  namespace:  argo-rollouts
  labels:
    example: basic
spec: {}    
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rollouts-demo-route
spec:
  port:
    targetPort: http 
  tls: 
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: argo-rollouts-stable-service 
    weight: 50

  alternateBackends:
    - kind: Service
      name: argo-rollouts-canary-service 
      weight: 50

NOTE: In the Route above, notice that the weight is 50/50 between canary/stable.

apiVersion: v1
kind: Service
metadata:
  name: argo-rollouts-canary-service
spec:
  ports: 
  - port: 80
    targetPort: http
    protocol: TCP
    name: http

  selector: 
    app: rollouts-demo
apiVersion: v1
kind: Service
metadata:
  name: argo-rollouts-stable-service
spec:
  ports: 
  - port: 80
    targetPort: http
    protocol: TCP
    name: http

  selector: 
    app: rollouts-demo
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  template: 
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m

  revisionHistoryLimit: 2
  replicas: 5
  strategy:
    canary:
      canaryService: argo-rollouts-canary-service 
      stableService: argo-rollouts-stable-service 
      trafficRouting:
        plugins:
          argoproj-labs/openshift:
            routes:
              - rollouts-demo-route  
      steps: 
      - setWeight: 30
      - pause: {}
      - setWeight: 60
      - pause: {}
  selector: 
    matchLabels:
      app: rollouts-demo

After you have created all of the above resources, I would expect that when the Rollout is reconciled by Argo Rollouts, Rollouts would call our plugin, and our plugin would set the Route canary weight to 0.

Instead, if you look at the Route, you will see it still has a canary weight of 50 under alternateBackends.

But this is an assumption on my part that needs to be verified.

Issue criteria:

chetan-rns commented 2 months ago

Hi @jgwest, I was not able to reproduce the issue. The canary weight in the Route was overwritten to zero once the Rollout was created. I followed the above example on OCP 4.16 with GitOps Operator v1.13.1. Looking at the logs, I could confirm that the Plugin API was called for the first time with the zero desired weight.

Logs from the plugin process:

2024-08-20T09:59:48.118Z [DEBUG] plugin.openshift: time=2024-08-20T09:59:48.118Z level=INFO msg="successfully updated route" plugin=trafficrouter vendor=openshift name=rollouts-demo-route weight=0

Logs from the rollouts controller:

time="2024-08-20T09:59:48Z" level=info msg="Previous weights: nil" namespace=argo-rollouts rollout=rollouts-demo
time="2024-08-20T09:59:48Z" level=info msg="New weights: &TrafficWeights{Canary:WeightDestination{Weight:0,ServiceName:argo-rollouts-canary-service,PodTemplateHash:687d76d795,},Stable:WeightDestination{Weight:100,ServiceName:argo-rollouts-stable-service,PodTemplateHash:,},Additional:[]WeightDestination{},Verified:nil,}" namespace=argo-rollouts rollout=rollouts-demo
time="2024-08-20T09:59:48Z" level=info msg="Traffic weight updated to 0" event_reason=TrafficWeightUpdated namespace=argo-rollouts rollout=rollouts-demo
time="2024-08-20T09:59:48Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"argo-rollouts\", Name:\"rollouts-demo\", UID:\"4beb21be-b62e-4e84-89e3-3c0dfd018b9d\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"104462\", FieldPath:\"\"}): type: 'Normal' reason: 'TrafficWeightUpdated' Traffic weight updated to 0"

The above logs were captured right after creating the Rollout resource and we can observe that the weight was updated to 0. Here's the code from the rollouts controller that calls the SetWeight() API.

I also tried the K8s Gateway plugin with Argo Rollouts and observed a similar behavior. I couldn't install the Gateway on OCP. But I tried the quick start guide on a kind cluster. I followed these steps:

  1. Create the argo-rollouts namespace and install the CRDs.
  2. Create a configmap to install the Gateway plugin https://rollouts-plugin-trafficrouter-gatewayapi.readthedocs.io/en/latest/installation/#installing-the-plugin
  3. Run the rollouts controller and verify that the plugin was downloaded successfully.
    go run ./cmd/rollouts-controller/main.go
  4. Follow the quick start guide till Step 3 https://rollouts-plugin-trafficrouter-gatewayapi.readthedocs.io/en/latest/quick-start/
  5. In Step 4, modify the HTTPRoute resource to assign weights to canary and stable backendRefs.
    backendRefs:
    - name: argo-rollouts-stable-service
      kind: Service
      port: 80
      weight: 50
    - name: argo-rollouts-canary-service
      kind: Service
      port: 80
      weight: 50
  6. Apply the HTTPRoute and verify that the weights are not overwritten.
  7. Create the Rollout and verify that the stable weight is initialized to 100 and the canary weight to 0.
chetan-rns commented 2 months ago

Included a simple e2e test to verify this behaviour: https://github.com/argoproj-labs/rollouts-plugin-trafficrouter-openshift/pull/13

jgwest commented 2 months ago

Thanks @chetan-rns for the detailed investigation, for the walkthrough of the behaviour with Gateway API, and for the E2E test which will verify we don't regress this behaviour!