fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.89k stars 730 forks source link

Issue with Canary Deployment: Metric Not Reporting #1716

Open infrawizard opened 1 day ago

infrawizard commented 1 day ago

I'm implementing a canary deployment using Flagger to monitor my application. The goal is to monitor the success rate of HTTP requests to a health endpoint (/ping). However, despite configuring the request-success-rate metric, Flagger isn't sending any metrics or requests to the endpoint. I am using traefik provider.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: test-service
  namespace: test
spec:
  provider: traefik
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-service
  progressDeadlineSeconds: 300
  service:
    port: 3000
    targetPort: 3000
  analysis:
    interval: 10s
    threshold: 10
    maxWeight: 50
    stepWeight: 5
    metrics:
      - name: request-success-rate
        interval: 30s
        thresholdRange:
          min: 99
        failureThreshold: 5
        query: "http://test-service:3000/ping"
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 10s
        metadata:
          type: bash
          cmd: "curl -X GET http://test-service:3000/ping"
      - name: load-test
        type: rollout
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 10s -q 10 -c 2 http://test-service:3000/ping"
          logCmdOutput: "true"
{{- end }}

I tested the curl and hey commands from inside the load tester pod and they work fine. But when I check my canary, it goes in failed status after initialized

Events: Type Reason Age From Message


Warning Synced 4m19s flagger test-service-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation Warning Synced 3m29s (x5 over 4m9s) flagger test-service-primary.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available Normal Synced 3m19s (x7 over 4m19s) flagger all the metrics providers are available! Normal Synced 3m19s flagger Initialization done! test-service.test Normal Synced 2m49s flagger New revision detected! Scaling up test-service.test Warning Synced 119s (x5 over 2m39s) flagger canary deployment test-service.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available Normal Synced 109s flagger Starting canary analysis for test-service.test Normal Synced 109s flagger Pre-rollout check acceptance-test passed Normal Synced 109s flagger Advance test-service.test canary weight 5 Warning Synced 89s (x2 over 99s) flagger Halt advancement no values found for traefik metric request-success-rate probably test-service.test is not receiving traffic: running query failed: no values found

I am not sure if I am missing something.

aryan9600 commented 8 hours ago

could you test if the required metrics are showing in your prometheus server?