fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

Add include path for OS X #8

Closed mtopolnik closed 9 years ago

mtopolnik commented 9 years ago

At least on my system, the include file path in the makefile is missing the path specific to OS X. I assume it will be similarly missed on other Mac systems.

carlomedas commented 9 years ago

Merged, just one comment: when building the full lib (including the JNI mapping) the provided makefile is best-effort; I actually use the Cmake configuration, tested and working on: Windows, Linux, Mac

mtopolnik commented 9 years ago

I see... I installed cmake and re-ran the build. I tried to get a quick performance measurement of 4mc, but I faced issues...

Jul 01, 2015 9:44:41 AM com.hadoop.compression.fourmc.FourMcNativeCodeLoader <clinit>
INFO: Loaded native hadoop-4mc library
Exception in thread "main" java.lang.InternalError: LZ4_decompress_safe returned: -65529
    at com.hadoop.compression.fourmc.Lz4Decompressor.decompressBytesDirect(Native Method)
    at com.hadoop.compression.fourmc.Lz4Decompressor.decompress(Lz4Decompressor.java:209)
    at com.hadoop.compression.fourmc.FourMcInputStream.decompress(FourMcInputStream.java:259)
    at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:77)
    at java.io.InputStream.read(InputStream.java:101)
    at org.example.lz4.TestPerformance.readCompressed4mc(TestPerformance.java:88)
    at org.example.lz4.TestPerformance.main(TestPerformance.java:79)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Am I doing something obviously wrong here? This is the test code:

public static void main(String[] args) throws Exception {
    FileInputStream fin = new FileInputStream("silesia.bin");
    final OutputStream out = new FourMcOutputStream(
            new FileOutputStream("silesia.4mc"), new Lz4Compressor(), BUF_SIZE);
    for (int count; (count = fin.read(DECOMPRESSED_BUF)) != -1; ) {
        out.write(DECOMPRESSED_BUF, 0, count);
    }
    out.close();
    fin.close();
    for (int i = 0; i < 10; i++) {
        readCompressed4mc();
    }
}

private static void readCompressed4mc() throws IOException {
    final InputStream cin = new FourMcInputStream(
            new FileInputStream("silesia.4mc"), new Lz4Decompressor(), BUF_SIZE);
    final long start = System.nanoTime();
    int sum = 0;
    for (int count; (count = cin.read(DECOMPRESSED_BUF)) != -1; ) {
        sum += count;
    }
    final long tookMicros = TimeUnit.NANOSECONDS.toMicros(System.nanoTime() - start);
    System.out.format("%,f MB/s\n", (double) sum / tookMicros);
    cin.close();
}
carlomedas commented 9 years ago

Since it's written to work with hadoop API's and interfaces, you should create the codec you want, e.g: CompressionCodec codec = new FourMcCodec(); // or use CodedFactory if within hadoop final OutputStream out = codec .createOutputStream(new FileOutputStream("silesia.4mc"));

And the same for reading it, leveraging codec's createInputStream(InputStream in). Please note that by using FourMcCodec you will use lz4 default compression (fast), you also have 3 more levels (as documented in main page of this project).

let me know...

mtopolnik commented 9 years ago

Thanks, this time it worked. However I got the same sub-1 GB/s speed as with any other approach I've been trying. This is when repeatedly reading a cached file (I measured pure FileInputStream reading speed at 1.8 GB/s). Actually, when the expected speed is calculated for sequentially reading then decompressing at 2.2 GB/s, it turns out to be just that. I used the formula 1 / (1/readSpeed + 1/decompSpeed).