fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

BUG: NPE when closing SequenceFile.Writer #25

Open hamlet-lee opened 7 years ago

hamlet-lee commented 7 years ago

version: 2.0.0 os: centos x64 hadoop: 2.7.1

scenario: Simply generate a compressed sequence file with FourMzUltraCodec.

Exception:

2017-06-16 22:32:48 [INFO ](c.h.c.f.FourMcNativeCodeLoader     :142) hadoop-4mc: loaded native library (embedded)
2017-06-16 22:32:48 [INFO ](c.h.c.f.ZstdCodec                  :84 ) Successfully loaded & initialized native-4mc library
2017-06-16 22:32:49 [INFO ](o.a.h.i.c.CodecPool                :153) Got brand-new compressor [.4mz]
Exception in thread "main" java.lang.NullPointerException
    at com.hadoop.compression.fourmc.ZstdCompressor.reset(ZstdCompressor.java:267)
    at org.apache.hadoop.io.compress.CodecPool.returnCompressor(CodecPool.java:204)
    at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1275)
    at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1504)
    at toolbox.analyzer2.TryFourMzUltra.run(TryFourMzUltra.java:44)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at toolbox.analyzer2.TryFourMzUltra.main(TryFourMzUltra.java:23)

my code

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.*;
import org.apache.hadoop.util.Tool;
import toolbox.analyzer2.util.debug.WrappedRunner;

/**
 * @author lisn
 */
public class TryFourMzUltra  extends Configured implements Tool {
    public static void main(String[] args) throws Exception {
        ToolRunner.run(new Configuration(),  new TryFourMzUltra(), args);
    }

    @Override
    public int run(String[] args) throws Exception {

        CompressionCodec codec = new com.hadoop.compression.fourmc.FourMzUltraCodec();

        Configuration conf = getConf();
        FileSystem fs = FileSystem.get(conf);

        SequenceFile.Writer writer = SequenceFile.createWriter(conf,
                SequenceFile.Writer.file(new Path("/tmp/testFourMzUltra")),
                SequenceFile.Writer.keyClass(LongWritable.class),
                SequenceFile.Writer.valueClass(Text.class),
                SequenceFile.Writer.compression(
                        SequenceFile.CompressionType.BLOCK,
                        codec
                ));
        writer.append(new LongWritable(1), new Text("12341234"));
        writer.close();  //exception raise from here

        return 0;
    }
}
carlomedas commented 7 years ago

Those codecs are reserved to be used in cojunction with 4mc/4mz files. What could work, in case you want to leverage the codecs in a different file format, like the sequence file, is if you try to use the ZstdUltraCodec Which is just providing compression, but not extended features for 4mz format.

hamlet-lee commented 7 years ago

I have tried ZstdUltraCodec as codec for sequence file. It works! Performance seems very good!