Open AnkitAdarsh opened 1 year ago
I am facing the same issue. In my case, I am trying using a PromQL query with Stackdriver as provider. The error I get is this:
Metric query failed for request-success-rate-stackdriver: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name
I tried using PromQL in a custom stackdriver
metric but it didn't work.
I think that the Flagger stackdriver
MetricTemplate only supports MQL queries, therefore passing a PromQL query fails as a bad request.
On the other hand, Cloud Monitoring does support PromQL using a specific endpoint (https://monitoring.googleapis.com/v1/projects/PROJECT_ID/location/global/prometheus/api/v1/query
), but it uses OAuth 2 authentication, which I don't know if / how Flagger supports this.
See: https://cloud.google.com/stackdriver/docs/managed-prometheus/query#api-prometheus,
I tried using an MQL query to compute request success rate on a custom stackdriver
metric, but I also got the same Request was missing field name
error message in the flagger logs.
Here are the error snippet and my query
Error:
msg: "Metric query failed for request-success-rate-gcp: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name."
stacktrace: "github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf
/workspace/pkg/controller/events.go:39
github.com/fluxcd/flagger/pkg/controller.(*Controller).runMetricChecks
/workspace/pkg/controller/scheduler_metrics.go:285
github.com/fluxcd/flagger/pkg/controller.(*Controller).runAnalysis
/workspace/pkg/controller/scheduler.go:744
github.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary
/workspace/pkg/controller/scheduler.go:433
github.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1
/workspace/pkg/controller/job.go:39"
Query:
fetch istio_canonical_service
| {
metric istio.io/service/server/request_count
| filter resource.namespace_name=="{{ namespace }}" && resource.canonical_service_name=="{{ target }}" && metric.destination_service_name=="{{ target }}" && metric.response_code!=200
| align rate(1m)
| every 1m
| sum
;
metric istio.io/service/server/request_count
| filter resource.namespace_name=="{{ namespace }}" && resource.canonical_service_name=="{{ target }}" && metric.destination_service_name=="{{ target }}"
| align rate(1m)
| every 1m
| sum
}
| outer_join 0
| div
| mul(-100)
| add(100)
@AnkitAdarsh After delving into the flagger source code, I figured out the problem: the MetricTemplate is missing the secret that includes the name of the GCP project. On the stackdriver metric documentation (https://docs.flagger.app/usage/metrics#google-cloud-monitoring-stackdriver), just follow these steps:
Create a secret that contains your project-id (and, if workload identity is not enabled on your cluster, your service account json). Then reference the secret in the metric template. Note: The particular MQL query used here works if Istio is installed on GKE.
We are facing issues with MQL while trying to Integration StackDriver with Flagger to Perform Canary Analysis. We have a GKE Cluster Setup and Workload Identity Configured for the Service Account.
During the analysis, events reported are as below:
I have a metric template as below that uses the query to fetch a sample metric which in this case if limit utilization.
Could you please provide suggestions on where we might be going wrong?