Closed Drewster727 closed 4 years ago
spark-metrics pushes metrics to Pushgateway using the pushgateway client library: https://github.com/banzaicloud/spark-metrics/blob/2.3-3.0.1/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L98 --> https://github.com/prometheus/client_java/blob/parent-0.8.1/simpleclient_pushgateway/src/main/java/io/prometheus/client/exporter/PushGateway.java#L181
If there is any connection leak it must be in the pushgateway client lib, however looking at the source code the client lib always disconnects when returns: https://github.com/prometheus/client_java/blob/parent-0.8.1/simpleclient_pushgateway/src/main/java/io/prometheus/client/exporter/PushGateway.java#L328
The increased memory usage of your Pushgateway instance might be caused by https://www.robustperception.io/common-pitfalls-when-using-the-pushgateway which can be avoided through the use of custom group keys: https://github.com/banzaicloud/spark-metrics/pull/46 which do not include the instance
field.
cc @sancyx @baluchicken
Not sure what was causing this, but disabling consistency checks per https://github.com/prometheus/pushgateway/issues/340 resolved my issue...
I'm experiencing an odd issue where my spark workers will randomly begin reporting that it cannot connect to my push gateway.
I have verified the pushgateway is up and running and I can connect to it without issue. However, I did notice that my pushgateway piles up in memory usage. The only piece that is sending metrics are my spark workers via this package/library.
I thought perhaps it could be a pushgateway issue, but then I found this issue on the pushgateway repo: https://github.com/prometheus/pushgateway/issues/340
That seems to indicate that something is pushing metrics into the gateway (this lib) and is not disposing of the connection properly?
Any assistance would greatly be appreciated. The error is not blocking my workers but it is very annoying causing logs to get spammed and instability in the pushgateway.
jars+versions
Thanks!