asdf-format / asdf-standard

Standards document describing ASDF, Advanced Scientific Data Format
http://asdf-standard.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
70 stars 29 forks source link

Support more compression formats #408

Open eschnett opened 8 months ago

eschnett commented 8 months ago

CHORD is a radio telescope in Canada https://www.chord-observatory.ca that's currently being constructed. We are considering / experimenting with file formats for various data products, and ASDF looks interesting because it is (a) simple and (b) can be efficiently streamed.

In the past, compression algorithms very similar to Blosc's https://www.blosc.org/pages/ "bitshuffle" have proven very useful. I wonder whether these could be added to the standard.

I have, as experiment, added support for c-blosc, c-blosc2, and zstd to https://github.com/eschnett/asdf-cxx . I wonder whether you are in principle interested in augmenting the standard, using blsc, bls2, and zstd as compression strings.

braingram commented 8 months ago

Thanks for opening an issue and for sharing your work. It's exciting to see another asdf implementation!

There has been some recent discussion about adding zstd support to asdf (see PR: https://github.com/asdf-format/asdf/pull/1570). As the python asdf now supports adding compression algorithms via extensions (see this example adding zstd support: https://github.com/braingram/asdf-zstd) we'd like to soon create a new asdf-compressors package that adds a number of compression algorithms (see the roadmap for a mention of this plan: https://github.com/asdf-format/asdf/wiki/Roadmap#changes-not-tied-to-a-particular-version). It would be great to coordinate this with asdf-cxx to make sure the labels match and features are compatible.

I will give asdf-cxx a closer look. Have you done much testing with files written by asdf-cxx and read by the python (or IDL) implmentation of asdf (and vice versa)? It would be great to hear more about asdf-cxx and your impressions of asdf.

braingram commented 8 months ago

FYI: I ran your demo-compression example (thanks for providing that with your code!). I had to slightly modify it to not attempt to save using blosc2 (I didn't immediately find it on homebrew). The file it generated was readable in python with the new modifications to the asdf-compression package (this is a work-in-progress and I hope to move it to the asdf-format organization soon). I opened an issue to track some compatibility tests (there is one other package that has already added some form of blosc support via an extension): https://github.com/braingram/asdf-compression/issues/3

eschnett commented 8 months ago

I think blosc2 is not available from Homebrew, Debian, etc. The main difference between blosc2 and blosc is that the former supports uncompressed data sizes larger than 2 GByte. For the time being just using blosc would be good enough.

I am now adding support for liblz4 as compressor to follow suit. I think you're using lz4f as token.