Add KafkaSink to support shipping metrics into a kafka stream

LucaCanali commented 3 years ago

Hi, thanks for submitting this PR. It seems potentially useful. Could you please provide additional context:

how has this been tested?
do you use it already?
could you cover the case of people using Kafka with authentication procols enabled?

hoaihuongbk commented 3 years ago

Opp. I miss the description for this pr.

A little about my project: we have a large number of spark jobs submitted to our cluster every day. And when the number of requests is increasing, it also requires higher optimization and cost savings.

In order to fit in with the current infra, which is spark deployed to a separate cluster and monitoring dashboard deployed in a separate cluster and our internal Kafka service, I cloned this repo and added the Kafka sink. Metrics are sent to the Kafka queue and then ingested into our internal influxdb (managed by dbops team). Finally, the metrics will be displayed on grafana and our team can monitor nearly real-time. Super cool !!!

Actually, this sink is no different from the influxdb that you implemented. Except sending metrics message to queue instead of writing directly to the database. We have been using it for a while. And it occurred to me that this might be useful to many other people as well. That's the main reason I submitted this pr.

Regarding the question: could you cover the case of people using Kafka with authentication protocol enabled? In fact, we deploy our infra on AWS and have a whitelist at the network layer (security group). So we don't need to use a username and password.

LucaCanali commented 3 years ago

Thanks for the additional explanations and for the work.

LucaCanali / sparkMeasure

Add KafkaSink to support shipping metrics into a kafka stream #36