Closed PallHaraldsson closed 7 months ago
SZIP should be installed by default and enabled.
julia> using HDF5
julia> HDF5.Filters.isencoderenabled(HDF5.API.H5Z_FILTER_SZIP)
true
julia> HDF5.API.h5z_filter_avail(HDF5.API.H5Z_FILTER_SZIP)
true
HDF5_jll is one of two packages that depend on libaec_jll:
libaec_jll uses the following free source
For good measure, this should be disambiguation from https://github.com/szcompressor/SZ
Good to see libaec_jll has HDF5_jll as a dependent, and thus HDF5.jl. That's what I wanted to see, and I had actually looked at:
https://juliahub.com/ui/Packages/General/HDF5_jll
and it's not listed as a dependency, or I would not have opened this issue. I realize it's cached information, and likely not often if ever updated. I've noticed missing package before. I suppose libaec_jll got added later, even recently.
I think I'll be closing the issue, but regarding SZ, I think you're saying we should support, then yes, if it's much used to read such files, or rather just later variant linked from there (seems very intriguing):
Note: SZ3 has been released here. SZ3 has much higher compression ratios than SZ2 in many cases, with comparable throughput (suffering slightly degraded throughput though). Details can be found in our ICDE21 paper.
SZ3: Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. "Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation", Proceeding of the 37th IEEE International Conference on Data Engineering (ICDE 21), Chania, Crete, Greece, Apr 19 - 22, 2021.
SZauto: Kai Zhao, Sheng Di, Xin Liang, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. "Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization", Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, 2020. (code: https://github.com/szcompressor/SZauto/)
I see szip in the code, but I'm not sure non-proprietary code to de/compress is used. Please close if it isn't, I found free drop-in replacement here:
https://gitlab.dkrz.de/k202009/
The algorithm is patented, and likely they have run out since, I found that free code. The project here links to info on only non-commercial use, implying not fully free/open source is currently used:
https://support.hdfgroup.org/doc_resource/SZIP/
EDIT:
https://www.hdfgroup.org/2017/05/hdf5-data-compression-demystified-2-performance-tuning/
The "third-party" linked to file not found, but I'm curious what other may be supported by underlying library, or this package, e.g. zstd? And Szip for sure freely?
I see now it's zstd plus likely at least these (any more of interest?):
zstd is a good standard, at least fast, and Szip had best compression, at least at the time, but no longer? Is some other considered best now (for scientific data), i.e. for size and/or speed, which then?