GoogleCloudPlatform / prometheus-engine

Google Cloud Managed Service for Prometheus libraries and manifests.
https://g.co/cloud/managedprometheus
Apache License 2.0
195 stars 93 forks source link

PromQL for Google Cloud Monitoring metrics doesn't support label regexs for integers #299

Open willchan opened 2 years ago

willchan commented 2 years ago

In Prometheus, all metric labels are strings, and PromQL supports label selection with regular expressions. In Cloud Monitoring, some metric labels are integers.

For example, we use a log based metric that counts HTTP requests by status code. The status code is INT64 label for the metric. Trying to query this metric using PromQL, we get the following error:

Error executing query: invalid parameter "query": substitute queries failed: convert vector selector failed: unable to parse value in matcher on int label to an int: strconv.ParseInt: parsing "5\\d\\d": invalid syntax.

The query is trying to count up the 5xx status codes, for use in an alert. It appears as if the label is an integer type, we can no longer use the label regex in PromQL.

lyanco commented 2 years ago

Hey William,

You're right - we don't currently convert the int to a string, which PromQL expects.

Can you paste your PromQL here?

willchan commented 2 years ago

Our metric is called k8s_http_requests, leading to the following PromQL: logging_googleapis_com:user_k8s_http_requests{request_status=~"5\\d\\d"}. Note the escaping of the backslash, since the \d is the regex escape sequence.

maxamins commented 2 years ago

Hey William, We don't support regexp for int labels. Can try using logging_googleapis_com:user_k8s_http_requests{request_status=~"500"}?

willchan commented 2 years ago

Well, we'd like to alert on 5xx error codes other than just 500, like 502 Bad Gateway or 503 Service Unavailable. I can create a separate alert for each one, but the label regex functionality is convenient for this.

I understand that you probably can't provide the regex functionality in Monarch. I imagine under the hood you could probably support something like integer comparison, so I could match 600>request_status>=500. I wonder if it would be reasonable to extend PromQL in GMP to support this, since it's already the case that PromQL doesn't work for integer labels.

lyanco commented 2 years ago

There's an open feature for Monarch to basically do a cast_int_as_string which would fix this problem. Not ready yet, unfortunately.

That's the real long term fix but we're trying to think of ways to unblock you in the meantime...

tdoernenburg commented 2 years ago

Any updates on this topic? I have the same problem with the service_server_request_count field. It's a bit misleading that the Promql documentation points to the prometheus Querying documentation that supports binary operators and regex.

I try to get a success percentage for flagger canary rollouts out of the istio mesh metrics but without an operator or a regex. I'm not quite sure how to create such a metric in Prometheus.

lyanco commented 2 years ago

Sorry, I don't understand. This is only an issue for Cloud Monitoring metrics queries through PromQL. You should be able to use regexes on regular Prometheus metrics (which I assume "service_server_request_count" is) without issue.

robmonct commented 1 year ago

Hi team, do we have some update about https://github.com/GoogleCloudPlatform/prometheus-engine/issues/299#issuecomment-1203227497 ? It would be really useful.

pintohutch commented 1 year ago

Hi @robmonct,

Yes. We're tracking that work here and will hopefully have some updates soon.

realschwa commented 1 year ago

This issue has actually been blocked internally since @pintohutch's last comment, but we hope to be unblocked soon and can post updates here as we get more information

aniekgul commented 1 year ago

Any updates on this? Running into the same problem making certain queries quite painful

pintohutch commented 1 year ago

Apologies @aniekgul - we're still working on rolling this out internally. Thanks for following up here though!

ajaufura commented 8 months ago

Any updates on this ?

lyanco commented 8 months ago

Nothing yet but this work is prioritized as part of a larger promql refactoring happening soon. I'd expect it to be unblocked sometime around September.

brunomanzo commented 1 month ago

hey @lyanco Any updates on this?

lyanco commented 1 month ago

Thanks for nudging me - we're still working on this but unfortunately we're not hitting the September date :-/

ETA early 2025, hopefully. It's a top priority of ours.