fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

4mc files not splitting #3

Closed pbutler closed 9 years ago

pbutler commented 9 years ago

I am running Hadoop 2.4.1. I run my jobs through mrjob (if that matters). When I run against an uncompressed file, splits happen and I automatically have more mappers than files. However when I run against .4mc files no splitting occurs. Running hadoop fs -text file.4mc works so I know it's decompressing okay and running a job against .4mc files works just no splitting occurs.

One other thing I noticed is that if I use the files with the .lz4_uc extension hadoop fs -text file.lz4_uc using I get the following error:

15/01/23 01:47:58 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Exception in thread "main" java.lang.InternalError: LZ4_decompress_safe returned: -2

I am not sure if that's related or not.

carlomedas commented 9 years ago

Please make sure to configure the 4mc input with: job.setInputFormatClass(FourMcTextInputFormat.class);

You can have a look at related example here: https://github.com/carlomedas/4mc/blob/master/java/hadoop-4mc/src/examples/text/TestTextInput.java

carlomedas commented 9 years ago

I consider this configuration issue closed, please reopen if reproducing.