fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

hadoop-4mc with AWS EMR? #13

Closed mikcox closed 7 years ago

mikcox commented 7 years ago

Hey Carlo!

Great work on this repository; I'm very excited about the potential.

I'm spinning up some AWS EMR clusters for a production workflow and I'm hoping to incorporate your compression. I'm curious about whether or not there would be an easy way to configure AWS EMR to pull in the hadoop-4mc library when it's started, since it'll be a bit of a pain to go in after the fact and install the library across the cluster.

Do you have any advice or suggestions for how to implement hadoop-4mc on an AWS EMR cluster, and if so could you add it to your documentation?

Cheers and thanks in advance!

mikcox commented 7 years ago

I'm new to this space and naive and after a day or so of googling I figured out more or less how to do this. I'll probably submit a little PR with some minor additions to the documentation to make installation trivial for a totally new (and braindead) user like myself. ;)

Cheers!

carlomedas commented 7 years ago

Thanks, with latest feature of embedded native libs it should work flawlessly aa long as you add it to your jobs cached libs.

Let me know and for sure I need to find time not only for better doc but also to put on same real example on Hadoop MR and spark and flink.