issues
search
Shopify
/
camus
Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7
stars
4
forks
source link
Gcs reconciler
#131
Closed
olessia
closed
6 years ago
olessia
commented
6 years ago
Add daily reconciliation of data for previous day.
Script logic
List .gz files on hdfs and gcs and pipe out put to respective files
Compare the two lists and fail in there's something in hdfs that's not in gcs
Report success/failure to datadog
Add daily reconciliation of data for previous day.
Script logic