Open infrawizard opened 3 weeks ago
I am having the same issues with istio.
I see that flagger is hiting prometheus. I see the query but for some uknown reason to me its just not getting any traffic to new pod. Canary deployment has 0 or 1 value when I query this metric. Traffic to old pod works and its showing on in prometheus.
@stefanprodan would really appreciate your input here.
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: request-duration
namespace: flagger
spec:
provider:
type: prometheus
address: http://mimir-distributed-gateway.observability:8080/prometheus
query: |
histogram_quantile(0.99,
sum(
irate(
istio_request_duration_milliseconds_bucket{
reporter="destination",
destination_workload=~"{{ target }}",
destination_workload_namespace=~"{{ namespace }}"
}[{{ interval }}]
)
) by (le)
)
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: request-success-rate
namespace: flagger
spec:
provider:
type: prometheus
address: http://mimir-distributed-gateway.observability:8080/prometheus
query: |
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"{{ namespace }}",
destination_workload=~"{{ target }}",
response_code!~"5.*"
}[{{ interval }}]
)
)
/
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"{{ namespace }}",
destination_workload=~"{{ target }}"
}[{{ interval }}]
)
)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: echo-server-cannary
namespace: debug
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: echo-server
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 600
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
name: echo-server
service:
# service port number
port: 80
# container port number or name (optional)
targetPort: 80
# Istio gateways (optional)
gateways:
- default/gw-dev-imba-com
# Istio virtual service host names (optional)
hosts:
-imba.com
match:
- uri:
prefix: /api/echo
# Istio traffic policy (optional)
trafficPolicy:
tls:
# use ISTIO_MUTUAL when mTLS is enabled
mode: ISTIO_MUTUAL
# Istio retry policy (optional)
retries:
attempts: 3
perTryTimeout: 1s
retryOn: "gateway-error,connect-failure,refused-stream"
analysis:
# schedule interval (default 60s)
interval: 1m
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 10
metrics:
- name: request-success-rate
templateRef:
name: request-success-rate
namespace: flagger
thresholdRange:
max: 500
interval: 5m
- name: request-duration
templateRef:
name: request-duration
namespace: flagger
thresholdRange:
max: 500
interval: 5m
# testing (optional)
webhooks:
- name: acceptance-test
type: pre-rollout
url: https://imba.com/api/echo
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' https://imba.com/api/echo | grep token"
- name: load-test
url: https://imba.com/api/echo
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://imba.com/api/echo"
I found that the problem is with metrics. Its not generating enough traffic to show any value for given metric, thus resulting in failed rollout.
➜ ~ istioctl version
client version: 1.24.0
control plane version: 1.21.0
data plane version: 1.21.0 (61 proxies)
➜ ~
@hrvatskibogmars its working for me also when I remove the metrics part but not working when I add it. Apparently we need prometheus for that but prometheus comes with flagger thats not getting the metrics. Are you tryiing anything else?
@aryan9600 I would really appreciate your input here
I'm implementing canary deployment using Flagger to monitor my application. However, despite configuring the request-success-rate metric, Flagger isn't sending any metrics or requests to the endpoint. I am using traefik provider.
I am installing flagger like below:
And canary with below:
Canary is getting succeeded without the metrics field but getting failed:
Below is my traefik config:
I am installing prometheus with flagger. The setup works without metrics but fails when its added. Not sure if I am missing anything in the setup. I see flagger-prometheus pod in the setup. Do I need to install anything else for inbuilt metrics to work? Or anything else missing in the setup?