fullcontact / hadoop-sstable

Splittable Input Format for Reading Cassandra SSTables Directly
Apache License 2.0
49 stars 14 forks source link

Does it work with hadoop 2.0.0? #12

Open gadodia opened 9 years ago

gadodia commented 9 years ago

Hey its not an issue, just wanted to know if you guys are working on to make it run on hadoop 2.0.0. Or is there a minor change that can make it run with hadoop 2.0.0 ?

bvanberg commented 9 years ago

At the moment it's not supported. I was anticipating adding it very soon.

On Fri, Nov 21, 2014 at 11:04 AM, Vineet Gadodia notifications@github.com wrote:

Hey its not an issue, just wanted to know if you guys are working on to make it run on hadoop 2.0.0. Or is there a minor change that can make it run with hadoop 2.0.0 ?

— Reply to this email directly or view it on GitHub https://github.com/fullcontact/hadoop-sstable/issues/12.

gadodia commented 9 years ago

We forked the project and this one works for hadoop 2.0 and later also. Link : https://github.com/igh/hadoop-sstable

bvanberg commented 9 years ago

Very nice. If you'd like to PR this back into hadoop-sstable let me know.

On Tue, Nov 25, 2014 at 11:17 PM, Vineet Gadodia notifications@github.com wrote:

We forked the project and this one works for hadoop 2.0 and later also. Link : https://github.com/igh/hadoop-sstable

— Reply to this email directly or view it on GitHub https://github.com/fullcontact/hadoop-sstable/issues/12#issuecomment-64519876 .

igh commented 9 years ago

Sure. We will do that.

java8964 commented 9 years ago

I want to add comments about the supporting of Hadoop 2. I have no problem to use the version 0.1.2 on Hadoop version 2.2, but I do have a problem to use the release 0.1.2 jar file from Maven repository.

1) It looks like the ONLY jar file of version 0.1.2 includes the hadoop 1.0.4 binary with it, which causes problem to use it on a hadoop 2.x environment. For example, I create the following section in my pom.xml file:

    <dependency>
        <groupId>com.fullcontact</groupId>
        <artifactId>hadoop-sstable</artifactId>
        <version>0.1.2</version>
        <scope>compile</scope>
        <exclusions>
            <exclusion>
                <groupId>org.apache.cassandra</groupId>
                <artifactId>cassandra-all</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

It will still include the hadoop 1.x binary in the final jar file, which I believe coming from "hadoop-sstable". Currently I HAVE to manually remove the "hadoop" related binary class from the jar file generated in the end. In our project, we also use "cassandra 1.2.x", which has different minor version as your guys (it won't cause any problem so far), but we are using Hadoop 2.2.

Here is my suggestion: 1) In the maven repository, can you release a jar file WITHOUT any dependence? Just include "fullcontact" related classes? I believe in that case, if I use the pom's dependence as above, It will include all other dependence jar from "hadoop-sstable", but exclude "hadoop-core", which is a wrong version for our environment. I am not a maven expert, but my guess is that due to you guys only release one jar file including all the dependences, maven has no way to exclude "hadoop" in my project.

Other than that, so far I didn't see any issue to be used in hadoop 2.2.

Thanks

Xorlev commented 9 years ago

You can exclude hadoop when you pull in the jar, but you'll run into trouble with any linkage-related changes, e.g. interfaces that are now abstract classes.

We'll likely take the approach of releasing a few different flavors that are targeted at different profiles (e.g. Hadoop 1, Hadoop 2, possibly differentiate CDHs). We're still evaluating for sure, and @bvanberg has the final say in the end.