apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Support exporting Spark metrics to various backends #88

Open ssuchter opened 7 years ago

ssuchter commented 7 years ago

For doing troubleshooting, health checking, and performance analysis of Spark applications, delivering Spark metrics to a dedicated metrics system is a frequently used paradigm. Frequently, dedicated metrics systems (e.g. time series databases) allow query modes that the built-in Spark UI does not show. (perhaps very flexible, longitudinal, or with a different presentation, etc)

It's desirable to not hardcode one true metrics backend into the project, but rather allow flexibility of delivery of metrics to various backends. The core Spark project allows this (for example, with a configurable metrics sink object).

There are unique challenges in the on-Kubernetes implementation of a metrics backend because of backend discovery, network connectivity, etc, that are appropriate for discussion and Kubernetes-specific implementation.

ash211 commented 7 years ago

Internally we've built an InfluxDB sink for Spark's Metrics system which requires this configuration:

*.sink.influx.protocol=https
*.sink.influx.host=localhost
*.sink.influx.port=8086
*.sink.influx.database=my_metrics
*.sink.influx.auth=metric_client:PASSWORD
*.sink.influx.tags=product:my_product,parent:my_service

Roughly the way it works is the various Spark JVMs push data out to that sink during job runtime and the InfluxDB sink collects the data.

Looking at that, I can imagine the protocol/host/port potentially being pulled from a k8s service and the auth being pulled from a k8s secret. Additionally the tags could possibly be pulled from k8s labels as well. This is all currently read out of the conf/metrics.properties file which where Spark metrics are normally configured.

Is this architecture roughly inline with the metrics backend you're most interested in? Are you imagining other places besides the 3 above where the metrics backend could integrate more tightly with kubernetes capabilities?

ssuchter commented 7 years ago

Sounds like a reasonable configuration. One of my concerns, though, is how does the code for the various metrics sink(s) get into the Spark containers? Options:

  1. The application owners required to link it into/submit it with their application-specific jars. Seems like it would be easy for “rogue” apps to get onto the cluster, which is generally unexpected.

  2. We put the code for various metrics sinks into the “core” product. Seems like a violation of the principle of not favoring one metrics backend.

  3. The code is baked into a per-site Docker image.

Personally I lean towards #3, but we’d want to make it easy for site owners to do that.

Thoughts?

kimoonkim commented 7 years ago

cc @ash211 @foxish @ssuchter

We now successfully export Spark metrics to a prometheus pod running inside our kubernetes cluster.

We have put a few metrics system config keys in the client's spark-defaults.conf (under /usr/local/spark-on-k8s/conf in the example below) that specifies GraphiteSink and the host port:

spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
spark.metrics.conf.*.sink.graphite.host=graphite-exporter
spark.metrics.conf.*.sink.graphite.port=9109
spark.metrics.conf.*.sink.graphite.period=5
spark.metrics.conf.*.sink.graphite.unit=seconds
spark.metrics.conf.driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
spark.metrics.conf.executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

Note one would normally put these in metrics.properties file and use --files to ship it around. But our spark-on-k8s code doesn't suppor this well, as reported in #162.

These config keys will affect the driver and executor JVMs in k8s pods to send metrics to the specified host and port. What's running on that address is another k8s pod that has an exporter service graphite-exporter from the prometheus project. It takes metrics from the driver and executors and hold them until they get picked up by the prometheus pod.

One can launch the graphite-exporter pod using something like the following kubectl commands. This is using the stock docker image prom/graphite-exporter from the prometheus project:

# Launch a pod
$ kubectl run graphite-exporter --image prom/graphite-exporter --labels=app=graphite-exporter --port=9109 --replicas=1

# Expose the service port
$ kubectl expose deployment graphite-exporter --port=9109 --target-port=9109

# Annotate it with hints telling prometheus to pick up metrics
$ kubectl annotate service graphite-exporter  prometheus.io/scrape=true  \
   prometheus.io/scheme=http  \
   prometheus.io/path=/metrics prometheus.io/port=9108

We also supplied to a config file to graphite-exporter so that it maps out the application name, component and pod name as labels. (In a hacky way now, we're thinking about using k8s ConfigMap to automate this)

A snippet of the config file looks like:

$ cat graphite_exporter_mapping
...

*.*.executor.filesystem.*.*
name="spark_filesystem_${3}_${4}"
application="$1"
component="exec-${2}"
pod_name="${1}-exec-${2}"

*.*.executor.threadpool.*
name="spark_threadpool_${3}"
application="$1"
component="exec-${2}"
pod_name="${1}-exec-${2}"

*.driver.jvm.*.*
name="spark_jvm_${2}_${3}"
application="$1"
component="driver"
pod_name="$1"

*.*.jvm.*.*
name="spark_jvm_${3}_${4}"
application="$1"
component="exec-${2}"
pod_name="${1}-exec-${2}"

With this config, you can see metrics like below from a spark-pi application:


$ kubectl exec -it graphite-exporter-2-3575368905-9m2zo /bin/sh
# wget -qO- http://localhost:9108/metrics
…
# HELP spark_threadpool_activeTasks Graphite metric spark-pi-1488567752296.2.executor.threadpool.activeTasks
# TYPE spark_threadpool_activeTasks gauge
spark_threadpool_activeTasks{application="spark-pi-1488567752296",component="exec-2",pod_name="spark-pi-1488567752296-exec-2"} 0
# HELP spark_threadpool_completeTasks Graphite metric spark-pi-1488567752296.2.executor.threadpool.completeTasks
# TYPE spark_threadpool_completeTasks gauge
spark_threadpool_completeTasks{application="spark-pi-1488567752296",component="exec-2",pod_name="spark-pi-1488567752296-exec-2"} 0
# HELP spark_threadpool_currentPool_size Graphite metric spark-pi-1488567752296.2.executor.threadpool.currentPool_size
# TYPE spark_threadpool_currentPool_size gauge
spark_threadpool_currentPool_size{application="spark-pi-1488567752296",component="exec-2",pod_name="spark-pi-1488567752296-exec-2"} 0
# HELP spark_threadpool_maxPool_size Graphite metric spark-pi-1488567752296.2.executor.threadpool.maxPool_size
# TYPE spark_threadpool_maxPool_size gauge
spark_threadpool_maxPool_size{application="spark-pi-1488567752296",component="exec-2",pod_name="spark-pi-1488567752296-exec-2"} 2.147483647e+09

The prometheus UI will look like below (Sorry, the tag names don't exactly match with the example above, because we changed them a bit in between)

prometheus-ui

ash211 commented 7 years ago

I mentioned an influxdb sink earlier in this thread -- we've now open sourced it and it's available at https://github.com/palantir/spark-influx-sink

tnachen commented 7 years ago

@ash211 Nice! I'll be trying it out