Closed alexanderfahlke closed 11 years ago
The 1.0.3 dependency in ivy.xml is just so that it has some version of Hadoop to compile against. I've actually tested it against a different 1.0.x version, but not 0.20.2 :) While it has the 1.0.3 dependency in ivy.xml, the Hadoop JARs download to a different directory and are not included in the fat jar. This is so you can use the JARs for the same version as your cluster.
So it doesn't have a dependency on a specific version of Hadoop, but that being said it does need to find the classes it was compiled against. Hadoop 0.20.2 apparently doesn't have these classes so it's not going to work it seems. You should have success with any 1.0.x version, or later ones too perhaps but I haven't tested it so can't say.
I'll add a comment in the README regarding what version it was compiled against and expected compatibility.
By the way, the reason for the dependency on CombineFileInputFormat is so that it can combine the log files. CombinedTextInputFormat derives from this and is used for this purpose. Otherwise you'll get one mapper per log file, which can mean many mappers :)
The classic one:
There is at least one dependency to hadoop-1.0.3 (I guess you guys at LinkedIn are using this).
In hadoop-0.20.2 the hadoop-core.jar is located in the hadoop base path and when running
run.sh
it throws an exception:If you copy the hadoop-core.jar to the libs directory you get the next exception:
I know that this was a dumb idea, but it was worth trying. ;)
The dependency is found in
white-elephant/hadoop/config/ivy/ivy.xml
:I would suggest to name the tested version in the README for the Hadoop-Jobs.
Funny side-note from README.md (Server section):
So the white-elefant front-end does not depend on a specific hadoop version but the jobs to generate the data do.
Seems like the following two JIRA tickets show the first problem: HADOOP-7055 HADOOP-7577