argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.67k stars 839 forks source link

traefikservices.traefik.containo.us not found error when using traefik.io/v1alpha1 TraefikService #3615

Closed evega-ws closed 1 month ago

evega-ws commented 3 months ago

Checklist:

Describe the bug Trying to use Traefik 3.0.0 with Argo Rollouts which does not include Traefik 1.X CRD traefik.containo.us.

Using a TraefikService object with apiVersion: traefik.io/v1alpha1 fails with the following error

traefikservices.traefik.containo.us "my-service" not found

Additional logs

{"event_reason":"TrafficRoutingError","level":"warning","msg":"traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"error","msg":"roCtx.reconcile err traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"info","msg":"Reconciliation completed","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z","time_ms":4.942671}
{"level":"error","msg":"rollout syncHandler error: traefikservices.traefik.containo.us \"canary-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"level":"info","msg":"rollout syncHandler queue retries: 132 : key \"default/example-rollout-canary\"","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}

To Reproduce ArgoCD Canary Rollout with trafficRouting configured to use traefik

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout-canary
spec:
  replicas: 2
  selector:
    matchLabels:
      app: example-rollout-canary
  template:
    metadata:
      labels:
        app: example-rollout-canary
    spec:
      containers:
      - name: example-rollout-canary
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocl: TCP
  strategy:
    canary:
      canaryService: canary-preview
      stableService: canary-endpoint
      maxUnavailable: 1
      steps:
      - setWeight: 20 
      - pause: {duration: 5m}
      - setWeight: 40
      - pause: {duration: 5m}
      trafficRouting:
        traefik:
          weightedTraefikServiceName: my-service

TraefikService using the 2.x API traefik.io/v1alpha1

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: my-service
spec:
  weighted:
    services:
    - name: canary-endpoint
      port: 80
    - name: canary-preview
      port: 80

Canary Service objects for Traefik to route to

apiVersion: v1
kind: Service
metadata:
  name: canary-endpoint
spec:
  selector:
    app: example-rollout-canary
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
--- 
apiVersion: v1
kind: Service
metadata:
  name: canary-preview
spec:
  selector:
    app: example-rollout-canary
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http

Expected behavior Argo Rollouts controller is able to look up and reference TraefikService resources using the newest API version.

Screenshots image

Version v1.6.6

Logs

{"event_reason":"TrafficRoutingError","level":"warning","msg":"traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"error","msg":"roCtx.reconcile err traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"info","msg":"Reconciliation completed","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z","time_ms":4.942671}
{"level":"error","msg":"rollout syncHandler error: traefikservices.traefik.containo.us \"canary-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"level":"info","msg":"rollout syncHandler queue retries: 132 : key \"default/example-rollout-canary\"","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

zachaller commented 2 months ago

Did you try setting these flags

command.Flags().StringVar(&traefikAPIGroup, "traefik-api-group", defaults.DefaultTraefikAPIGroup, "Set the default Traerfik apiGroup that controller uses.")
command.Flags().StringVar(&traefikVersion, "traefik-api-version", defaults.DefaultTraefikVersion, "Set the default Traerfik apiVersion that controller uses.")
smutoni2022 commented 2 months ago

@zachaller I am having the same issue. Did setting the flags work for you? and if so how do we set the flags in argo-rollout deployment?

smutoni2022 commented 2 months ago

@evega-ws do you have any progress on this bug because I am having the same error as you described above.

evega-ws commented 2 months ago

@smutoni2022 Unfortunately I have not been able to fix this. Given that I am using helm I've set the flags as follows

controller:
  extraArgs:
  - "--traefik-api-group=traefik.io"
  - "--traefik-api-version=traefik.io/v1alpha1"

I used --traefik-api-version=traefik.io/v1alpha1 as seen in the tests file https://github.com/argoproj/argo-rollouts/blob/master/utils/defaults/defaults_test.go#L406 . The code seems to reflect this is the correct syntax.

    group := defaults.GetTraefikAPIGroup()
    parts := strings.Split(defaults.GetTraefikVersion(), "/")
...
    SetTraefikAPIGroup("traefik.containo.us")
    assert.Equal(t, "traefik.containo.us", GetTraefikAPIGroup())
    SetTraefikAPIGroup(DefaultTraefikAPIGroup)
    assert.Equal(t, DefaultTraefikAPIGroup, GetTraefikAPIGroup())

    SetTraefikVersion("traefik.containo.us/v1alpha1")
    assert.Equal(t, "traefik.containo.us/v1alpha1", GetTraefikVersion())
    SetTraefikVersion(DefaultTraefikVersion)
    assert.Equal(t, DefaultTraefikVersion, GetTraefikVersion())

The flag seems to be applied correctly, however it is still unable to pick up my TraefikService.

{"event_reason":"TrafficRoutingError","level":"warning","msg":"my-service.traefik.io is forbidden: User \"system:serviceaccount:argocd:argo-rollouts\" cannot list resource \"my-service\" in API group \"traefik.io\" in the namespace \"templates\"","namespace":"templates","rollout":"example-rollout-canary","time":"2024-07-05T23:21:28Z"}

This makes sense as it is trying to list a resource type called my-service.traefik.io which should not exist. It should be a traefikservices.traefik.io type of resource, with a name of my-service. The previous error is a good example of how it should work

traefikservices.traefik.containo.us "my-service" not found
NOT
my-service.traefik.containo.us

Changing the ClusterRole permissions to add list to the apiGroups: -traefik.io option makes no difference, in case the role was a missing list permission.

Unsure as to how to proceed, given that we are using traefik > v3.0 we are not in a position to fall back to the deprecated traefik.containo.us apigroup.

smutoni2022 commented 1 month ago

@evega-ws I have tried the same arguments in my helm chart as well and I got same error . Is it possible to reopen this issue to get more visibility from others?

evega-ws commented 1 month ago

@smutoni2022 I am unable to re-open the issue, perhaps @zachaller could re-open the issue if warranted? It doesn't look like the flags usage works in this case.

smutoni2022 commented 3 weeks ago

@zachaller @BrunoTarijon This fix is not working. I tested it by upgrading to the latest argo-rollout helm chart and upgrade traefik api to traefik.io. I still get the same error about the service not being found. Can you explain how we can implement this fix beyond what I did,

BrunoTarijon commented 3 weeks ago

Hey, I don't think that my changes are in the latest release (1.7.1), I build the image from the master branch. Maybe it is in the 1.7.2 release. The 1.7.1 release is from june 24

smutoni2022 commented 5 days ago

@BrunoTarijon @zachaller I have tested this again with the latest release of 1.7.2. No luck . It still shows the service not found error mentioned before. I am not sure if there is an extra config I need to make in the chart other than updating the chart version.

BrunoTarijon commented 4 days ago

@smutoni2022, I have just installed the argo-rollouts in a new local cluster (1.7.2 release) and add the arg to the deployment

      args:
        - --traefik-api-group=traefik.io
        - --traefik-api-version=traefik.io/v1alpha1

everything seems to work

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: traefik-service
spec:
  weighted:
    services:
      - name: nginx-canary
        port: 80
      - name: nginx-stable
        port: 80
---

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      containers:
      - name: guestbook
        image: argoproj/rollouts-demo:blue
  replicas: 5
  strategy:
    canary:
      canaryService: nginx-canary
      stableService: nginx-stable
      trafficRouting:
        traefik:
          weightedTraefikServiceName: traefik-service 
      steps:
      - setWeight: 40
      - pause: {duration: 10}
      - setWeight: 60
      - pause: {duration: 10}
      - setWeight: 80
      - pause: {duration: 10}

---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: nginx
  name: nginx-stable
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: nginx
  name: nginx-canary
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx

Maybe if you shared more info I can help you.

smutoni2022 commented 1 day ago

@BrunoTarijon I was missing the arguments. I added the extra ergs in the values file and works fine now. Thank you.