banzaicloud / spark-metrics

Spark metrics related custom classes and sinks (e.g. Prometheus)
Apache License 2.0
176 stars 66 forks source link

Configure sink to stop sending job as label/group-key #70

Closed prcastro closed 3 years ago

prcastro commented 3 years ago

Is your feature request related to a problem? Please describe. Whenever I'm running spark-metrics on YARN, I see all metrics being tagged with job=application_yarn_id. This means when I restart the application I create new groups on pushgateway, increasing memory usage by a lot.

Describe the solution you'd like to see A way to remove job from labels and group-keys. For example, if we override the job on both labels and group-key I would expect spark-metrics not to send application_yarn_id

Describe alternatives you've considered I tried to configure my application overriding the job group-key with no success:

*.sink.prometheus.group-key=job=streams"
prcastro commented 3 years ago

Seems related to https://github.com/banzaicloud/spark-metrics/issues/39 but that issue was closed

stoader commented 3 years ago

Can you try either setting spark.app.id to a fixed value like streams instead of application_yarn_id or set spark.metrics.namespace to streams. Currently, this is how you can control the value for the job tag.

prcastro commented 3 years ago

Setting spark.metrics.namespace solved the problem! Thanks