Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7 stars 4 forks source link

Add statsd metrics with topic and patition tags #93

Closed olessia closed 6 years ago

olessia commented 6 years ago
olessia commented 6 years ago

I think the counter-based approach makes sense when you have global stats: every job is just adding to the global counter. For our purposes though, we need information that is topic (and partition) -specific, which is why we ran into an issue with counters -- there shouldn't be so many counters. I've changed the approach of the StatsdReporter slightly, so it can operate from job configuration, which is available from the context in the place where we want to report our stats, rather than the job object that is not available from that place. I have also run this locally and tested that the actual posting to statsd is reached in the code, although locally there's no where to actually publish the metrics. It's possible that there's a gotcha hiding in there somewhere, which is why it will be great to have the staging environment to test this.

dterror-zz commented 6 years ago

Makes sense. I'll merge all staging tomorrow.