fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.81k stars 722 forks source link

Canary service never gets any traffic #1109

Open bmbferreira opened 2 years ago

bmbferreira commented 2 years ago

Currently the Canary CRD is responsible to manage a just single VirtualService. The problem is that I'm using a single VirtualService to route to multiple services. This is because, since the cross-resource order is UNDEFINED, I want to avoid having to deal with non-deterministic problems when conflicts between multiple VirtualServices mapping the same hostnames (as described here).

To overcome this limitation, I was trying the following:

  ┌──────────────────────┐
  │public-ingress-gateway│                 ┌───────────────────┐             ┌─────────────────────────────┐
  └──────────▲───────────┘                 │web-virtual-service├───────▲─────►  web primary/canary services│
             │                             └────▲──────────────┘       │     └┬────────────────────────────┤
         references                             │      references      │      └────────────────────────────┘
             │                                  │         │            │
┌────────────┴────────────┐                     │       ┌─▼──────────┐ │               ┌─────────┐
│web-proxy-virtual-service├─────forwards────────┤       │mesh-gateway│ ├────updates────┤ flagger │
└─────────────────────────┘                     │       └─▲──────────┘ │               └─────────┘
                                                │         │            │
                                                │     references       │
                                                │         │            │
                                        ┌───────▼─────────┴──────┐     │  ┌─────────────────────────────────┐
                                        │websites-virtual-service├─────▼──► websites primary/canary services│
                                        └────────────────────────┘        └┬────────────────────────────────┤
                                                                           └────────────────────────────────┘

with the following resources:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  annotations:
  name: web-proxy
  namespace: test
spec:
  gateways:
  - public-ingress/public-ingress
  - mesh
  hosts:
  - www.myapp.com
  - myapp.com
  - dev.myapp.com
  http:
  - match:
    - uri:
        regex: ^\/happy-holidays/?$
    - uri:
        regex: ^\/case-study-tara/?$
    - uri:
        regex: ^\/lead-generation/?$
    name: websites
    route:
    - destination:
        host: websites.test.svc.cluster.local
  - match:
    - uri:
        regex: ^\/debug/?$
    - uri:
        regex: ^\/widget-preview/?$
    - uri:
        regex: ^\/hidden-features/?$
    - uri:
        regex: ^\/login/?$
    - uri:
        regex: ^\/signup/?$
    - uri:
        regex: ^\/api/
    name: web
    route:
    - destination:
        host: web.test.svc.cluster.local
  - match:
    - uri:
        prefix: /
    name: fallback
    route:
    - destination:
        host: websites.test.svc.cluster.local
---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: web
  namespace: test
spec:
  analysis:
    interval: 1m
    maxWeight: 100
    stepWeight: 20
    threshold: 5
  progressDeadlineSeconds: 60
  provider: istio
  service:
    hosts:
    - web
    port: 80
    portDiscovery: false
    targetPort: 3001
    trafficPolicy:
      tls:
        mode: ISTIO_MUTUAL
  skipAnalysis: false
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
status:
  canaryWeight: 0
---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: websites
  namespace: test
spec:
  analysis:
    interval: 1m
    maxWeight: 100
    metrics:
    - interval: 10s
      name: request-success-rate
      threshold: 0
      thresholdRange:
        min: 99
    stepWeight: 20
    threshold: 5
  progressDeadlineSeconds: 60
  provider: istio
  service:
    gateways:
    - mesh
    hosts:
    - websites
    port: 80
    portDiscovery: false
    targetPort: 3001
    trafficPolicy:
      tls:
        mode: ISTIO_MUTUAL
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: websites
status:
  canaryWeight: 0

The problem is that when I trigger a deploy of websites, even though the traffic of the VirtualService managed by the Canary changes to 80/20 (primary/canary), the canary never gets any traffic and it ends up failing.

Is there any solution that allows me to keep a single virtual service or the only solution is to split it in two and associate each one of them with the public-ingress gateway?

bmbferreira commented 2 years ago

Noticed that this is not an issue with argo rollouts since argo rollouts does not creates/manages a virtual service and instead it references an existing one: https://argoproj.github.io/argo-rollouts/features/traffic-management/istio/#rollout-ownership-over-the-virtual-service

Would this something that is planned to be implemented in flagger? It would avoid this issue since I could have a single VirtualService with all the routes and just reference it on the flagger Canary CRD.

stefanprodan commented 2 years ago

Flagger does support VirtualService delegation, maybe that solves your issue?

bmbferreira commented 2 years ago

Hi @stefanprodan, thanks for answering! Are you referring to this delegate? I tried that but it doesn't work with regex matches. I get this error if I try to have a virtual service and then delegate the regexes to other virtual services:

delegate url match does not support regex match for delegating istio

more info here: https://github.com/istio/istio/issues/29845 https://github.com/istio/istio/blob/master/pkg/config/validation/virtualservice.go#L159

easayliu commented 2 years ago

Hi @stefanprodan, thanks for answering! Are you referring to this delegate? I tried that but it doesn't work with regex matches. I get this error if I try to have a virtual service and then delegate the regexes to other virtual services:

delegate url match does not support regex match for delegating istio

more info here: istio/istio#29845 https://github.com/istio/istio/blob/master/pkg/config/validation/virtualservice.go#L159

i`v get the same error. do you get the solutions?

bmbferreira commented 2 years ago

@easayliu for this specific issue I ended up deploying an internal proxy (I used haproxy but it can be nginx, envoy, etc) inside the cluster that basically redirects the traffic based on the url path.

The other option is to use argo rollouts instead of flagger.