[Metrics] JMX Exporter Config

confluentinc / cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.

https://cnfl.io/getting-started-kafka-kubernetes

Apache License 2.0

790 stars 843 forks source link

[Metrics] JMX Exporter Config #437

Open rodolfo-picoreti opened 4 years ago

rodolfo-picoreti commented 4 years ago

Hi, currently we create one different prometheus metric for each jmx bean variation, for instance:

 - pattern : kafka.server<type=BrokerTopicMetrics, name=(.+)><>OneMinuteRate
      name: "cp_kafka_server_brokertopicmetrics_$1"

This approach will produce a lot of different metrics that cannot be easily aggregated to generate statistics like: what are the top 5 topics with most processing delay / bandwith consumption, etc.

IMO it would be better if we could aggregate them using labels instead.

So instead of:

cp_kafka_server_brokertopicmetrics_bytesinpersec_topic_<TOPIC_NAME>

Something like this:

cp_kafka_server_brokertopicmetrics_bytesinpersec{topic=<TOPIC_NAME>}

We could achieve this by changing the configuration to something like:

 - pattern : kafka.server<type=(.+), name=(.+), topic=(.+)><>OneMinuteRate
   name: "cp_kafka_server_$1_$2" 
   attrNameSnakeCase: true
   labels:
     topic: $3

@srolija @qshao-pivotal Can you comment?

srolija commented 4 years ago

Hi, my only relation to this JMX part is the small contribution of fix for the slow JMX metrics so I am in no way any authority. As far as I can see, currently, the topic is not exported as part of metrics, instead, name relates to the nested metric names (ie. bytesinpersec, bytesoutpersec etc.). For this use-case, personally I feel this is a good way to export the metrics as you would rarely aggregate the different types across (ie. bytesinpersec and messagesinpersec).

Is then your proposal to export topic level metrics and to do so using the labels? If so, I agree that labels are way to do it. My only worry is that on larger clusters, those could add up to thousands of ranges because each broker would export multiple stats for every single topic that is being stored and I am not sure that would be expected from the chart out of the box.

This does not prevent you from having a different JMX configuration for your own purposes. We do the same thing, we keep a private forked version of the chart that has some patches that are not applicable nor useful to most people (ie. some naming overrides).

rodolfo-picoreti commented 4 years ago

Hey @srolija different types would still produce different metrics, like in the example in my first post. The only difference is that, each topic would not produce a different metric but differerent labels. Therefore, we would be able to easily do things like: create generic dashboards that can filter a specific topic or see the topic with highest throughput.

Regarding the fork, I think basic configuration like this should not require a fork otherwise it would be a horrible experience for the users of the chart. If the maintainers do not agree with my suggestion, maybe we can add a way to override the current configuration then.

srolija commented 4 years ago

I completely get the use-case, basically, this is a request to add topic as a label to existing metrics, instead of renaming anything?

I just wanted to note the possible downside, aside from potentially needing to update the Grafana Dashboard that is part of the repo. As I said I have no authority over the chart, so just wanted to give my perspective as well. Personally I would like to have a way to get topic data and the number of generated ranges isn't the issue given we don't have that many topics. :)