When StatsdMetricsReporter is used in a distributed Apache Spark application, then the kafka configuration properties map is instantiated on one node (called driver), and then propagated to - and used on - multiple other nodes (called executors). See example app.
Moreover, when running on DC/OS, the statsd connection settings (hostname and port) vary between nodes, and are provided to each instance of the application in the process environment.
This means that if the host+port is taken from the environment on the driver, and then propagated and used on executors, metrics will not be exported (at least not correctly). A symptom of this is kafka metrics being produced by the driver but not executors.
One way to solve this problem is to change StatsdMetricsReporter to take $STATSD_UDP_HOST and $STATSD_UDP_PORT from the environment if the external.kafka.statsd.* properties are not present.
When
StatsdMetricsReporter
is used in a distributed Apache Spark application, then the kafka configuration properties map is instantiated on one node (called driver), and then propagated to - and used on - multiple other nodes (called executors). See example app.Moreover, when running on DC/OS, the statsd connection settings (hostname and port) vary between nodes, and are provided to each instance of the application in the process environment.
This means that if the host+port is taken from the environment on the driver, and then propagated and used on executors, metrics will not be exported (at least not correctly). A symptom of this is kafka metrics being produced by the driver but not executors.
One way to solve this problem is to change
StatsdMetricsReporter
to take$STATSD_UDP_HOST
and$STATSD_UDP_PORT
from the environment if theexternal.kafka.statsd.*
properties are not present.