aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 85 forks source link

Blosc filters have no effect #273

Open watsaig opened 8 months ago

watsaig commented 8 months ago

Creating a dataset with any of the blosc filters compiles and runs with no errors, but does not compress the data at all. If I use lzf or szip instead, the dataset is compressed as expected.

Just to be clear, the filter does appear to be applied (looking at the output of h5dump), but there is no compression.

Are there any external dependencies needed for blosc to work?

Here is a minimal example:

use hdf5::filters;
use ndarray::Array2;
use std::env::temp_dir;
fn main() -> anyhow::Result<()> {
    println!("Blosc available? {:}", filters::blosc_available());
    println!("LZF available? {:}", filters::lzf_available());
    println!("SZIP available? {:}", filters::szip_available());

    let path_uncomp = temp_dir().join("uncompressed.h5");
    let path_comp = temp_dir().join("compressed.h5");
    let file_uncomp = hdf5::File::create(&path_uncomp)?;
    let file_comp = hdf5::File::create(&path_comp)?;

    let data = Array2::<f32>::ones((1000, 1000));
    file_uncomp
        .new_dataset_builder()
        .with_data(data.view())
        .create("data")?;

    file_comp
        .new_dataset_builder()
        .blosc_lz4(9, true)
        //.blosc_zstd(9, true)
        //.blosc_snappy(9, true)
        //.lzf()
        //.szip(filters::SZip::NearestNeighbor, 16)
        .with_data(data.view())
        .create("data")?;

    println!(
        "Uncompressed file size: {:} kB",
        path_uncomp.metadata()?.len() / 1024
    );
    println!(
        "Compressed file size: {:} kB",
        path_comp.metadata()?.len() / 1024
    );
    Ok(())
}

Cargo.toml:

[dependencies]
anyhow = "1.0.80"
hdf5 = { git = "https://github.com/aldanor/hdf5-rust.git", features = [
    "blosc",
    "lzf",
] }
ndarray = { version = "0.15.6" }

The output is:

Blosc available? true
LZF available? true
SZIP available? true
Uncompressed file size: 3908 kB
Compressed file size: 3910 kB

Using szip, the compressed file size is 12 kB.

mulimoen commented 8 months ago

This would happen if the compressor is not available for blosc. If one specifies --features blosc-src/lz4,blosc-src/zlib one gets down to 19kB with the blosc-lz4 filter and 8kB with blosc-zlib.

It is unfortunate that we don't error on trying to apply the filter when it is not available, but instead skip it. Setting https://github.com/aldanor/hdf5-rust/blob/4a9b537f0c7ba3f75712ba240fe9ffeb1fd9447e/hdf5/src/hl/filters.rs#L472 to the mandatory flag would provide such a message

watsaig commented 8 months ago

I see, thank you. I added blosc-src = { version = "0.3.0", features = ["lz4", "zlib", "zstd"] } to Cargo.toml to make it work. May I suggest adding this to the documentation of the blosc_ functions?

Agreed that an error would be great in this case, or maybe even a more in-depth function like blosc_available that would return which of the blosc filters are available.