JuliaIO / Zarr.jl

Other
119 stars 23 forks source link

Benchmark vs ZFP #58

Open miguelraz opened 3 years ago

miguelraz commented 3 years ago

zfp is a library by LLNL that also does multidimensional array compression.

It would be nice to see benchmarks against their vectorized (SIMD) algorithms.

You can easily use their library via ]add zfp_jll on recent Julia versions and call the shared library methods.

cc @aviks, who helped setup the zfp_jll wrapper and might have some usage examples lying around.

meggart commented 3 years ago

Thanks for bringing this to our attention. Zarr itself does not implement any compression algorithm on its own.

However, it would be nice to add zfp as an additional compression algorithm, it is already wrapped in the Python implementation of zarr (https://numcodecs.readthedocs.io/en/latest/zfpy.html), I just did not yet stumble across a dataset which used it, so it is not implemented yet.

If anyone wants to tackle this: compressors are currently defined in https://github.com/meggart/Zarr.jl/blob/master/src/Compressors.jl. In this case it would probably be necessary to slightly modify the compressor interface, because so far the compressors operated on pure byte arrays, so the compressor did not see information on the structure of the chunk (size and number of dimensions). Maybe we would need an abstract type for ByteCompressor and NDArrayCompressor, would also be useful when we add JPEG2000 compression or similar in the future.

meggart commented 2 years ago

There was some discussion on the Python implementation of zfp compression, where there was a similar problem with flattening arrays during compression like I described above https://github.com/zarr-developers/numcodecs/issues/303