Improve logging. - Githubissues

Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.

7 stars 4 forks source link

Improve logging. #104

Closed olessia closed 6 years ago

olessia commented 6 years ago

Remove event_timestamp spammy logging.
Fix kafka.max.pull.hrs confusing message
Add a wrapper log class that echoes to stdout so Splunk picks up the logs See staging output in Splunk

Closes https://github.com/Shopify/camus-project/issues/11 with the caveat that only some hand-picked important Camus longs will appear in Splunk.

dterror-zz commented 6 years ago

adds dedup too, intentional?

dterror-zz commented 6 years ago

So you're mapping debug -> info and error -> warn, intentional? We'll get all of the debug entries for these classes, are they not adding a lot of noise? You ran it in staging, so I'm half-assuming all answers will be all good, just asking anyway :)

olessia commented 6 years ago

Ugh, thanks for catching that. And I think I will remove the debug printout. When I tested in staging I expected the debug to be there, but for the wrong reason. Will update the PR

olessia commented 6 years ago

To be honest, this is not a pretty solution. I'm going with what Hadoop book says:

Each task child process produces a logfile using log4j (called syslog), a file for data sent to standard out (stdout), and a file for standard error (stderr).

I've spent quite a bit of time trying to figure out how to redirect log4j, but to no result 😞 This is the next best thing from sprinkling printlns all over the place.

dterror-zz commented 6 years ago

I figured you'd done that. Next best thing is good enough.