Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7 stars 4 forks source link

Lad monitor #65

Closed dterror-zz closed 8 years ago

dterror-zz commented 8 years ago

This adds a new job that will check for late-arriving-data[1] in our Camus drops and will fail (and alert) if it finds any case.

[1] we're defining late-arriving-data as data (not metadata) that has been written after the _IMPORTED flag has been added to the folder.

@drdee

dterror-zz commented 8 years ago

@drdee :ship: ?

drdee commented 8 years ago

Discussed this off-line, :ship:

yagnik commented 8 years ago

Have we looked at the new importers of confluent, do they handle this scenario better ?

dterror-zz commented 8 years ago

We haven't looked extensibly, we'll be able to take them for a spin once we updated the cluster.

How are you doing, man?

yagnik commented 8 years ago

I really feel that given the cheap cost of storage pretty much all cluster updates shuold move to coreos style, aka update on different partition and swap including all the configs. (partially unrelated rant)

I'm doing well! how's everything there ? I was chatting with @drdee yesterday and he was telling me about the beautiful land of data and how instafacts is the seed to the new world order. You guys need to really rename instafacts, it's a shit name

dterror-zz commented 8 years ago

haha you're absolutely right, but now everyone's using it and we're stuck with this.

Are you working with Coreos stuff? I'd love to hear that, we have to chat sometime outside of Github comments.