Closed Gallardot closed 3 years ago
Label matching may support negations, regex and regex negations if used with https://pkg.go.dev/github.com/prometheus/prometheus/pkg/labels#Matcher
@jpfe-tid Label matching may support negations, regex and regex negations is a great idea, but parsing the configuration can get complicated. If more people need this feature, maybe we can implement it later
I don't understand how this PR helps with deduplication. Can you provide a minimal architecture as an example?
@jpfe-tid
We use Prometheus Operator to create Prometheus. For high availability, Prometheus replica is 2.
Remote write with prometheus-kafka-adapter, we will receive data from both Prometheus at the same time.
Each Metric has such a label, prometheus_replica
. The value may be prometheus-business-0
or prometheus-business-1
, indicating which Prometheus the data comes from.
So if we configure such match rules
up{prometheus_replica="prometheus-business-0"}
We will just send the data of prometheus-business-0 to Kafka
We will just send the data of prometheus-business-0 to Kafka
Understood, however this is a push based static deduplication procedure, which means that if prometheus-business-0
is down, no data will make it to kafka.
Therefore a pull solution like https://github.com/Telefonica/prometheus-kafka-adapter/issues/54#issuecomment-670154428 (thanos not needed) is more convenient, because if prometheus-business-0
is down, it should fetch data from prometheus-business-1
or any other replica.
Thanks for your explanation.
Understood, however this is a push based static deduplication procedure, which means that if
prometheus-business-0
is down, no data will make it to kafka.Therefore a pull solution like #54 (comment) (thanos not needed) is more convenient, because if
prometheus-business-0
is down, it should fetch data fromprometheus-business-1
or any other replica.
Thanks again for your explanation. Push Based
solution does have such a defect. But some scenarios are acceptable, such as when we are doing time-series data analysis. If necessary, we may adopt the Pull Based
solution you suggested.
@jpfe-tid what do you think? Is there anything important left? "Done is better than perfect"
ping @jpfe-tid
The data of Prometheus is of great value in both real time and offline time series data analysis. But at the same time, there is a lot of data in Prometheus that is not needed, which puts pressure on Kafka and wastes computation and storage. So we want the data to do some filtering before it goes into Kafka. Based on the ideas provided by @palmerabollo https://github.com/Telefonica/prometheus-kafka-adapter/issues/54#issuecomment-670154428 , we made some slight improvements. Added simple matching rule feature.
Multiple matching rules with
;
separated. If we add the following matching rule:This means that we are sending the following three kinds of data to Kafka:
foo
bar
and the value of Labelx
is equal to1
up
and the value of Labelx
is equal to1
, the value of Labely
is equal to2
In high availability mode, we have 2 or more prometheus servers in a cluster. By adding matching rules, it also helps to solve the data duplication problem. #54