Remote Read Fails when HA dedup is enabled

andrewl3wis commented 3 years ago

Describe the bug If HA dedup is enabled, downstream Prometheus can't read because it sends the read request with the HA pair's label.

This shows up as empty data in the Prometheus that is attempting to read the data.

If you force the HA label off at the Prometheus endpoint that is trying to remote read, it works fine.

In the kube-prometheus helm chart this is controlled with: replicaExternalLabelNameClear: true

To Reproduce Steps to reproduce the behavior:

Setup cortex in HA mode and Promethues as remote read/write
Send some data and then try to query it, if HA dedup is on, it comes back empty
Force HA label off, HA config on cortex can be left alone.

Expected behavior

To be able to read remotely when HA dedup is setup

Environment:

Infrastructure: Kubernetes
Deployment tool: helm

Storage Engine

[X] Blocks
[X] Chunks

It fails in both, we now run blocks.

Additional Context

We can work around this by the stripping the replica label off the remote label, but that then triggers a bunch of errors for prometheus not sending successfully to the remote target, which in turn triggers our alerting.

pracucci commented 3 years ago

Thanks for your report @andrewl3wis. I'm wondering if filtering out the HA replica label name from the matchers received in the remote read request would fix the issue (and it's safe to do).

@gouthamve @cstyan Thoughts about it?

gouthamve commented 3 years ago

This is indeed a good point, and I think stripping the replica from the matchers might be safe, when both the cluster and replica label are there.

Having said that, I think this is better tackled in Prometheus itself, by being able to force some matchers being sent. Similar to we have now:

# An optional list of equality matchers which have to be
# present in a selector to query the remote read endpoint.
required_matchers:
  [ <labelname>: <labelvalue> ... ]

We could add:

# An optional list of equality matchers which will be
# added when querying the remote read endpoint.
addtional_matchers:
  [ <labelname>: <labelvalue> ... ]

Then users can set additional_matchers: replica: '' to fix this. Thoughts @bwplotka?

bwplotka commented 3 years ago

Well I believe this idea is literally: https://github.com/prometheus/prometheus/issues/7992

It was rejected, but we can rethink this I would be happy to add this.

I don't get one thing, where you use Remote Read in Cortex?

pracucci commented 3 years ago

I don't get one thing, where you use Remote Read in Cortex?

We don't use remote read directly in Cortex but we do support it, so you can configure Prometheus to read from Cortex.

bwplotka commented 3 years ago

Got it!

csmarchbanks commented 3 years ago

Yep, this is https://github.com/prometheus/prometheus/issues/7992, I would also consider rethinking the behavior in Prometheus itself.

I think you can work around this by adding {prometheus_replica=""} to your query, or maybe {prometheus_replica=~".*"} depending on your setup. Replacing prometheus_replica with whatever you use as the deduplicating label.

andrewl3wis commented 3 years ago

So as the end user, my perspective is that because cortex is "eating" the tag due to HA deduplication that maybe it should live in the cortex ecosystem. Or at least a heads up around the HA config setup.

It's kind of a non-obvious gotcha that I think has bitten a few other folks based on messages in the Cortex slack

I am open to chasing this down in the prom ecosystem, but then I have to chase it down with the prom-operator to have reads strip tags off, as well as any other tools that speak to Prometheus natively.

csmarchbanks commented 3 years ago

I agree with you, Cortex should probably handle this since it is removing the HA label. I think I agree with the top comments that filtering out the HA label in remote read would be safe and the correct thing to do.

nikil1424 commented 9 months ago

I have run into this same problem.

I have cortex running with HA enabled. My prometheus retention is set to 1h and remote read/write pointing to Cortex. When I try to query Prometheus for data older than 1h, I get empty results. This was working before I enabled the HA.

I have managed to find out and use the "filterExternalLabels" option under remoteReadSpec [https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#remotereadspec] . However, the queries now return two results - one with the external labels and one without any external label. Because of this, all my grafana charts do not display right visual data.

Is there an option similar to filterExternalLabels for Prometheus TSDB just like remoteRead? Or is there an option in Prometheus to always query remote storage and not local storage? (maybe set to tsdb retention to 0?)

Are we going to have a fix in Cortex anytime soon or are there any workarounds? PS - I do not want to update 100s of charts to exclude HA label from every query.

yeya24 commented 9 months ago

IIUC there is no one working on this atm. Feel free to open a pr to fix the behavior. I think I am ok to make it under a feature flag or make it the default behavior.

For temporary workaround, maybe you can check https://github.com/prometheus-community/prom-label-proxy. I am unsure if it supports remote read, but seems it doesn't. If it supports remote read API, it should be easy to inject the additional matchers for every query you have.

nikil1424 commented 9 months ago

I have found the workaround solution. https://cortexmetrics.io/docs/guides/ha-pair-handling/#remote-read.

cortexproject / cortex

Remote Read Fails when HA dedup is enabled #3714