Closed snoe925 closed 6 years ago
I found that these changes were required to get 4mz working with newAPIHadoopFile. Here is an example spark shell reader.
sc.newAPIHadoopFile("data.4mz", classOf[com.hadoop.mapreduce.FourMzTextInputFormat], classOf[org.apache.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text])
Why hasn't this been merged yet?
Specifically, commit f6a57e3 has a really basic fix necessary for ZSTD to function properly. I would also add that FourMcTextInputFormat also needs to add the LongWritable and Text generic fields like FourMzTextInputFormat in your version.
I can volunteer as a maintainer. I can also make an official repo if you want to avoid notifications.
I'd like to merge the pull requests of the first part. While the index changes on the 4mc CLI is not clear to me. What is it doing? The index in 4mc/4mz files is already inside the file itself.
P.S.: I can you your help to rebuild the lib on all platforms.
I should have pushed the external index code on a branch. I was doing an experiment on timestamp indexing the data in a 4mz. Let me fix the pull request.
I have removed the incorrect index code commit from this pull request.
For platform building I will open a separate pull request for a Travis CI integration file. That can build Linux and OS X. I do not have Windows build machines.
Yes that'd be perfect, even if Linux is not an issue. I'm going to rebuild a new version of the lib soon and also Mac is easy. The only issue I have now is with windows, where you need cygwin64 to build it correctly to work good with latest JRE7/8 on latest Windows versions. Since I don't think there is a lot of people using it, we could even think about releasing without it unless we find the time to recreate the build system I unfortunately lost in the past year...
Modern Hadoop does not require core-site.xml configurations for codecs.
This allows the codec to work in Spark by adding the jar to the classpath. You can copy the jar to the spark jars directory.
Implementations that do not have JavaServices code will work the same as without this META-INF data.