Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7 stars 4 forks source link

Camus in production #23

Closed yagnik closed 9 years ago

yagnik commented 10 years ago

I apologize for a massive pr. This pr gets us ready with a jar that can be deployed in production. Included in it:

please review @drdee @airhorns @honkfestival

yagnik commented 9 years ago

@drdee @airhorns @dylanahsmith @honkfestival can I get some thoughts on this?

honkfestival commented 9 years ago

Oy, sorry for letting this go so long @yagnik. I'll get to it today.

airhorns commented 9 years ago

How much of this can we merge upstream? The point of using open source software is not having to maintain it and mature it ourselves, so as we diverge from the upstream implementation we just inherit a zombie mass of code that we have to maintain ourselves but didn't write ourselves, so I am very hesitant to ship this many changes to Camus core. By the looks of github compare we have diverged a lot. I thought this repo was just for config vendoring or something like that, like we have with Spark.

Better would be to add our own subclasses or at least something like that instead of just knifing in and changing their code, and then at least we can merge in upstream changes over time and have the java compiler tell us what interfaces are now broken and that kind of thing.

yagnik commented 9 years ago

@airhorns I'm getting another pr ready to merge most of this upstream, shopify specific code is already namespaced under shopify but I'm pretty sure we can also merge that upstream.

honkfestival commented 9 years ago

@yagnik I agree with @airhorns, it seems like we've diverged more than we need to from upstream.

Could you separate this PR into an upstream one (which I think you said you're already doing) and a hopefully smaller one that stays in our fork?

drdee commented 9 years ago

Should upstreaming be on the critical path? Because if it is it will most probably delay putting Camus in production.

yagnik commented 9 years ago

I'd like to get this merged in and then open a separate pr for merging upstream if everyone is ok with that.

airhorns commented 9 years ago

It will never get done unless you do it now.

drdee commented 9 years ago

I think it will be easier to upstream if we can say that this has been running in production and we have ironed out the final bugs.

airhorns commented 9 years ago

I think you guys are focusing on shipping, when what we need to focus on is shipping sustainably. Sustainably means having less debt to pay off and less code to maintain. I am sorry that it is more work, but I think its up to you guys to meet the quality standard and in my mind this doesn't. What can I do to help?

drdee commented 9 years ago

This PR's size has ballooned too much and we should split it in four 4 separate PR's and upstream them first:

airhorns commented 9 years ago

That sounds awesome.

yagnik commented 9 years ago

Upstream WIP https://github.com/yagnik/camus/pull/2 I have refactored the code a bit in my branch and added some more tests as I go. I'll keep this PR open to track on shopify side.

yagnik commented 9 years ago

All changes have been merged upstream. Shopify changes have been moved to https://github.com/Shopify/camus/pull/24 Keeping this PR for my sweeper changes.