Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7 stars 4 forks source link

Better Watermarking and LAD #107

Closed olessia closed 6 years ago

olessia commented 6 years ago

Log counts of late-arriving data. Add delay-hours parameter that allows delaying watermarking. Add a new way to check for LAD: based on paths touched in the last N runs.