When setting a Canary object to so session affinity with an Kubernete API Gateway like in Session Affinity. I was running a K6 test to verify that users were assigned to a version, and weren't shifted back on a successful deploy.
I noticed that within 1 second, all the users were assigned to the next version.
I believe this is happening because the HTTPRoute being created doesn't pin the user to the primary version.
Describe the bug
When setting a Canary object to so session affinity with an Kubernete API Gateway like in Session Affinity. I was running a K6 test to verify that users were assigned to a version, and weren't shifted back on a successful deploy.
I noticed that within 1 second, all the users were assigned to the next version.
I believe this is happening because the HTTPRoute being created doesn't pin the user to the primary version.
HTTPRoute
```yaml spec: hostnames: - charmander.example.com parentRefs: - group: gateway.networking.k8s.io kind: Gateway name: default-gateway namespace: istio-ingress rules: - backendRefs: - group: "" kind: Service name: charmander-primary port: 9898 weight: 0 - group: "" kind: Service name: charmander-canary port: 9898 weight: 100 matches: - headers: - name: Cookie type: RegularExpression value: .*flagger-cookie.*nROEvCteRd.* path: type: PathPrefix value: / - backendRefs: - group: "" kind: Service name: charmander-primary port: 9898 weight: 95 - filters: - responseHeaderModifier: add: - name: Set-Cookie value: flagger-cookie=nROEvCteRd; Max-Age=3600 type: ResponseHeaderModifier group: "" kind: Service name: charmander-canary port: 9898 weight: 5 matches: - path: type: PathPrefix value: / ```
Note,
charmander
is a deployment ofghcr.io/stefanprodan/podinfo
To Reproduce
K8s Yaml and K6 script
```yaml --- # Source: charmander/templates/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: charmander namespace: charmander labels: app.kubernetes.io/name: charmander app.kubernetes.io/component: "web" spec: minReadySeconds: 5 replicas: 3 revisionHistoryLimit: 5 progressDeadlineSeconds: 60 strategy: rollingUpdate: maxUnavailable: 1 type: RollingUpdate selector: matchLabels: app.kubernetes.io/name: charmander app.kubernetes.io/component: "web" template: metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: "9797" unique-title: 'greetings from deploy v1' labels: app.kubernetes.io/name: charmander app.kubernetes.io/component: "web" spec: containers: - name: podinfod image: ghcr.io/stefanprodan/podinfo:6.5.0 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 9898 protocol: TCP - name: http-metrics containerPort: 9797 protocol: TCP - name: grpc containerPort: 9999 protocol: TCP command: - ./podinfo - --port=9898 - --port-metrics=9797 - --grpc-port=9999 - --grpc-service-name=podinfo - --level=info - --random-delay=false - --random-error=true env: - name: PODINFO_UI_COLOR value: "#34577c" - name: PODINFO_UI_MESSAGE valueFrom: fieldRef: fieldPath: metadata.annotations['unique-title'] startupProbe: exec: command: - podcli - check - http - localhost:9898/healthz initialDelaySeconds: 30 timeoutSeconds: 5 resources: limits: cpu: 2000m memory: 512Mi requests: cpu: 100m memory: 64Mi --- # Source: charmander/templates/canary.yaml apiVersion: flagger.app/v1beta1 kind: Canary metadata: name: charmander-canary namespace: charmander spec: # when set to true, deploy will auto succeed, only use during an emergency. skipAnalysis: false # deployment reference targetRef: apiVersion: apps/v1 kind: Deployment name: charmander # the maximum time in seconds for the canary deployment # to make progress before it is rollback (default 600s) progressDeadlineSeconds: 120 service: gatewayRefs: - group: gateway.networking.k8s.io kind: Gateway name: default-gateway namespace: istio-ingress hosts: - 'charmander.example.com' port: 9898 targetPort: 9898 analysis: interval: 1m maxWeight: 50 metrics: [] sessionAffinity: cookieName: flagger-cookie maxAge: 3600 stepWeight: 10 threshold: 5 ``` And running the k6 script ```javascript import http from 'k6/http'; import { check, sleep } from 'k6'; export const URL = "https://charmander.example.com/" export const options = { // A number specifying the number of VUs to run concurrently. vus: 6, // A string specifying the total duration of the test run. duration: '600s', // Disable clearing cookies noCookiesReset: true }; function parseRevision(resp) { try { return resp.json().message; } catch (e) { return null } } export function setup() { return { revision: null, changeCount: 0 }; } export default function (data) { var resp = http.get(URL); var revision = parseRevision(resp); if (data.revision == null) { console.log(`VU initial version ${revision}`) data.revision = revision; } if (revision && revision !== data.revision) { data.changeCount++; console.log(data.revision + " : " + revision) data.revision = revision; } check(resp, { 'changeCount < 2': () => data.changeCount < 2 }); } export function teardown(data) { console.log(data); } ```
The output looks like
Expected behavior
When running, the users are ~ the correct percent of assigned users.
Additional context