Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7 stars 4 forks source link

[WIP] Using retries when committing files to the final destination #152

Open dterror-zz opened 6 years ago

dterror-zz commented 6 years ago

And if the file already exists, we fallback to not doing anything. Since we're doing moves instead of copies, there's no way to move with overwrite, the issue then becomes, what if the file that got moved is in a failed/corrupted state. We could move to a copy-instead-of-move approach, would have performance impact plus could have other repercussion I don't know about.

Code is also convoluted as hell, we can reason about it better on Monday.

olessia commented 6 years ago

The other scary thing here is that a move means the original file may no longer exist in the original place if the move was wonky. I think I like the copy instead of move approach. Most of the time the performance impact should be tolerable (minus whatever reason it is we're seeing for failing to catch up). So,

dterror-zz commented 6 years ago

after playing with it over the weekend, I think this strategy is insufficient and too fragile. We are better off doing actual rollbacks on failure: https://github.com/Shopify/camus/pull/152. I tested in staging, but need to be able to reproduce failing an attempt. There's an API apparently.