MathisRosenhauer / libaec

libaec - Adaptive Entropy Coding library
https://gitlab.dkrz.de/k202009/libaec
BSD 2-Clause "Simplified" License
12 stars 9 forks source link

References and comparison #35

Open milankl opened 2 months ago

milankl commented 2 months ago

I've come across libaec a few times now and people have described it as (if I remember correctly) a 2D (but it does not seem to provide matrix size parameters for the bitstreamed data) lossless compressor for data that has some mutual information in adjacent array elements (you call it "low entropy" but I guess you're referring to a local low entropy not necessarily a global low entropy across the entire data array). It's described here as good for "space imaging instrument data or numerical model output from weather or climate simulations" but I struggle to find any references, research papers, comparisons, blog posts etc for a comparison with other lossless compressors, say zlib or zstandard.

I see that the focus here is on a standard and not necessarily a nice user interface in a high-level programming language but maybe I've just missed a wrapper for python, julia, whatever that exists somewhere? I'm the lead author of this paper https://www.nature.com/articles/s43588-021-00156-2 and I had been wondering whether it would have been worth comparing libaec to zstandard (which we used in that study) in the bitinformation framework. For unsigned integers obtained after a linear packing with scale and offset, bitrounding can be applied equivalently removing the high entropy in the tailing bits. For floats, I guess just reinterpreting them simply as unsigned integers? I reckon that's not too bad but maybe one should transform the biased exponent with a signed one? Well, but in the end I never got around writing a wrapper for this as I did with ZfpCompression.jl although I think it's easier because no 2-4D arrays with strides?

milankl commented 2 months ago

@miha-at-ecmwf or @juanjodd did you ever use/compare to libaec?

MathisRosenhauer commented 2 months ago

libaec works on blocks of J n-bit 'samples'. So you could consider it 1D.

I didn't follow publications about the method in recent years but there is an old paper from Pen Shu Yeh https://ntrs.nasa.gov/api/citations/20020081027/downloads/20020081027.pdf with comparisons to zlib and others. A more recent report by ECMWF compares different GRIB compression methods including CCSDS https://www.ecmwf.int/en/elibrary/81320-impact-grib-compression-weather-forecast-data-and-data-handling-applications.

If you search for publications, you might get more hits with 'CCSDS lossless compression'. libaec implements the extended-Rice algorithm, not the wavelet based one.

There is no wrapper for Python or Julia I'm aware of. All downstream use is AFAIK in HDF5 and various GRIB libraries (C/C++).

milankl commented 2 months ago

Okay thanks yeah, I see Tiago Quintino and others having experimented with libaec within grib. But they don't compare it to any of the "big" codecs like zlib, zstandard, blosc, lz4, ... that are widely used and supported. I'm a bit surprised that no one (?) ever did this comparison? I'd be happy to write a Julia wrapper around the library here if I get the feeling it's worth it, but maybe some clarifications if I may ask

MathisRosenhauer commented 2 months ago