flagger drops virtualhost object from contour's HTTPProxy definition with A/B canary rollout

rootik commented 1 year ago

Describe the bug

We are trying to use flagger in our progressive delivery efforts particularly in A/B testing scenario. Ingress provider we use is Contour while the mesh provider is linkerd. Testing of weighted canary deployments with linkerd was successful, but A/B testing with Contour went into a failure. When starting the canary iteration flagger modifies HTTPProxy resource and drops virtualhost object from it. Which causes the HTTPProxy to become orphaned. Just a note: we are not using nested HTTPProxy definitions so each deployment defines it's own root HTTPProxy.

Another thing: flagger doesn't roll back the HTTPProxy CRD to the original state after unsuccessful rollout resulting the application to stop serving requests.

To Reproduce

Canary definition

---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: devops-test-app
  namespace: devops
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: devops-test-app
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: devops-test-app
  progressDeadlineSeconds: 60
  service:
    port: 5000
    targetPort: 5000
  provider: contour
  analysis:
    interval: 1m
    threshold: 5
    iterations: 10
    match:
    - headers:
        contoso-test:
          exact: "integration"
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 2m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 2m

Original HTTPProxy CRD

---
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: devops-test-app
  namespace: devops
  labels:
    app.kubernetes.io/name: devops-test-app
    app.kubernetes.io/version: "0.1.0"
spec:
  virtualhost:
    fqdn: devops-test-app.contoso.com
    tls:
      secretName: star-cert
  routes:
    - services:
      - name: devops-test-app
        port: 5000
      requestHeadersPolicy:
        set:
        - name: l5d-dst-override
          value: "%CONTOUR_SERVICE_NAME%.%CONTOUR_NAMESPACE%.svc.cluster.local:%CONTOUR_SERVICE_PORT%"
      responseHeaderPolicy:
        remove:
        - l5d-client-id
      timeoutPolicy:
        response: 30s

HTTPProxy CRD after flagger starts canary advancement

---
apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: devops-test-app
  namespace: devops
spec:
  routes:
  - conditions:
    - header:
        exact: integration
        name: contoso-test
      prefix: /
    services:
    - name: devops-test-app-primary
      port: 5000
      requestHeadersPolicy:
        set:
        - name: l5d-dst-override
          value: devops-test-app-primary.devops.svc.cluster.local:5000
    - name: devops-test-app-canary
      port: 5000
      requestHeadersPolicy:
        set:
        - name: l5d-dst-override
          value: devops-test-app-canary.devops.svc.cluster.local:5000
      weight: 100
  - conditions:
    - prefix: /
    services:
    - name: devops-test-app-primary
      port: 5000
      requestHeadersPolicy:
        set:
        - name: l5d-dst-override
          value: devops-test-app-primary.devops.svc.cluster.local:5000
      weight: 100
    - name: devops-test-app-canary
      port: 5000
      requestHeadersPolicy:
        set:
        - name: l5d-dst-override
          value: devops-test-app-canary.devops.svc.cluster.local:5000
status:
  conditions:
  - errors:
    - message: this HTTPProxy is not part of a delegation chain from a root HTTPProxy
      reason: Orphaned
      status: "True"
      type: Orphaned
    lastTransitionTime: ""
    message: At least one error present, see Errors for details
    observedGeneration: 3
    reason: ErrorPresent
    status: "False"
    type: Valid
  currentStatus: orphaned
  description: this HTTPProxy is not part of a delegation chain from a root HTTPProxy
  loadBalancer:
    ingress:
    - ip: 1.1.1.1

Expected behavior

flagger keeps virtualhost object definition in HTTPProxy during canary advancement in A/B testing scenartio.
flagger rolls back HTTPProxy CRD to the original state if the rollout was unsuccessful.

Additional context

Flagger version: 1.31.0
Kubernetes version: 1.26.3
Service Mesh provider: linkerd stable-2.13.5
Ingress provider: contour 1.24.2

aryan9600 commented 1 year ago

The Contour integration is built in such a way that it expects to create the HTTPProxy object itself using the values provided in the Canary definition and so it ends up overriding any existing HTTPProxy objects. Users are expected to create another HTTPProxy with the desired .spec.virtualHost which will include the one generated by Flagger:

apiVersion: projectcontour.io/v1
kind: HTTPProxy
metadata:
  name: devops-test-app-ingress
  namespace: devops
spec:
  virtualhost:
    fqdn: devops-test-app.contoso.com
    tls:
      secretName: star-cert
  includes:
    - name: podinfo
      namespace: test
      conditions:
        - prefix: /

I recommend you go through the tutorial once: https://fluxcd.io/flagger/tutorials/contour-progressive-delivery/

rootik commented 1 year ago

Thanks for your reply. Why would flagger then touch existing HTTPProxy objects bound to a certain deployment?

fluxcd / flagger