criteo / cassandra_exporter

Apache Cassandra® metrics exporter for Prometheus
Apache License 2.0
171 stars 94 forks source link

Query issues for "large" clusters #21

Closed poblahblahblah closed 6 years ago

poblahblahblah commented 6 years ago

Hello,

We have a ~540 node cassandra cluster that are exporting ~1500 metrics each. We're sending over 800k time series in the cassandra_stats metric namespace. This is causing a lot of issues when querying Prometheus since the index gets hit so hard. Recording rules are definitely an option, but we don't always know in advance when something should have a recording rule to perform any aggregation.

Is there a workaround for this in the current code base? If not, would you be open to exploring a change with us?

erebe commented 6 years ago

Hello,

Sadly there is no magic bullet, at least that I am aware of, to scale out Cassandra metrics. Before running prometheus we were having also issues fitting everything into Graphite TSDB (~130 cassandra nodes).

AFAIK, Prometheus index are by labels and not only just on the namespace, so I can't think of much improvements I can make in the code. If you have an idea or think otherwise feel free to tell, I am listenning you.

Here is what I can propose you :

If you think of some other solutions feel free

erebe commented 6 years ago

feel free to re-open if needed