aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
310 stars 85 forks source link

Fix multiple filter problems and add tests #185

Closed aldanor closed 2 years ago

aldanor commented 3 years ago

Closes #183

aldanor commented 3 years ago

This is WIP, but at least it's reproducible now.

aldanor commented 3 years ago

As can be seen in the filter pipeline test, it's not just blosc/lzf filters, looks like something gets corrupted with the filter pipeline when applying it to the dataset.

aldanor commented 3 years ago

@mulimoen Any thoughts where the corruption might happen? Hm..

E.g. for the first filter in the pipeline (only when it gets attached to the dataset), you get

filter_id=5, cdata=[8, 1, 200000, 1, 4, 0, 32, 0]

whereas for nbit filter (id=5) the cdata should be an empty array.

thread 'hl::filters::tests::test_filter_pipeline' panicked at 'called `Result::unwrap()` on an `Err` value: expected length 0 cdata for nbit filter',
mulimoen commented 3 years ago

I am not very familiar with the filter code, but I've had a look at the lzf filter. In here the filter is modified in set_local_lzf which increases the number of elements to three which clashes with the check in parse_lzf which checks

ensure!(cdata.is_empty(), "expected length 0 cdata for lzf filter");

This check should probably be removed or check if 0 or 3?

mulimoen commented 3 years ago

And I do wonder if nbit filter must also have a larger cdata to store offset and precision. The shuffle filter must store the number of bytes for shuffling.

aldanor commented 3 years ago

And I do wonder if nbit filter must also have a larger cdata to store offset and precision. The shuffle filter must store the number of bytes for shuffling.

Yep, agreed, and that logic is pretty complicated per-filter (i.e. cdata size may depend on the data itself), and in some cases there's up to around 20 values being stored. I've relaxed the checks so that only the minimum length is verified in the cases where we need to pull the parameters.

Now the failure you see in the filters test is the one where blosc filter parameters are zeroed out.

aldanor commented 3 years ago

That is:

// before applying to dataset (just storing in DCPL)
cdata = [
    0, 0, 0, 0, 9, 1, 5,
]
// after applying to dataset and then extracting it back
cdata = [
    2, 2, 4, 800000, 0, 0, 0,
]

Note that all internal parameters have been filled out now (first 4 values), but the user parameters (last 3) got erased somehow...

aldanor commented 3 years ago

Was an actual bug - a single character mistake which essentially disabled all blosc filters 🤣 (hopefully fixed now in baf4ddb - and we'll have to release a version asap so that it actually works)

aldanor commented 3 years ago

Interesting, now there's obscure conda-related failures (I think I've seen that years ago and hacked around it somehow, but need to remember what exactly to do here).

aldanor commented 3 years ago

Looks like the failures are related to HDF5_PLUGIN_PATH somehow

aldanor commented 2 years ago

Any thoughts on how to reproduce/fix this? Hm...

(would be nice to merge this asap and release 0.8.1 since there is a serious bug fixed in this PR already)

aldanor commented 2 years ago

Another thought that I had - don't hide Blosc, BloscShuffle structs etc under #[feature = "blosc"] so that you could read filter pipeline and see it properly (and not as Filter::User).

However, then we need to manually check that filters are available (both cfg!(feature = "blosc") and blosc_available()) when applying the filters to DCPL. Maybe it makes sense to do it regardless.

mulimoen commented 2 years ago

I tried reproducing in python here: https://github.com/mulimoen/hdf5_py_test/actions/runs/1473774852 but could not get the error to trigger. Maybe h5py has a workaround?

mulimoen commented 2 years ago

Another thought that I had - don't hide Blosc, BloscShuffle structs etc under #[feature = "blosc"] so that you could read filter pipeline and see it properly (and not as Filter::User).

Would this not depend on BLOSC_FILTER_ID being exactly 32001? This might be a problem for all files created by other libraries. (This is also a breaking change)

aldanor commented 2 years ago

Another thought that I had - don't hide Blosc, BloscShuffle structs etc under #[feature = "blosc"] so that you could read filter pipeline and see it properly (and not as Filter::User).

Would this not depend on BLOSC_FILTER_ID being exactly 32001? This might be a problem for all files created by other libraries. (This is also a breaking change)

Blosc filter ID being 32001 is a sort of immutable invariant that we have to accept, I guess... More details here: https://support.hdfgroup.org/pubs/rfcs/RFC_Filters-registration-policy-final.pdf

aldanor commented 2 years ago

Ok, I was able to reproduce it, same as one of the failures in CI: macOS, using 1.8.18 from anaconda main channel.

Error: H5Pset_filter(): failed to call private function: can't open directory: .../envs/hdf5-1.8.18/lib/hdf5/plugin

Apparently, there's H5PL* API for managing plugin paths etc, but it's only available on 1.10.1 and later. Before that it just looks at env vars which may be cached or not... I'll give it a try tomorrow to see if anything can be hacked around.

aldanor commented 2 years ago

Hm, if setting HDF5_PLUGIN_PATH manually to some obscure folder, it now yields

Error: H5Pset_filter(): failed to call private function: failed to load dynamically loaded plugin

So just setting the env var wouldn't solve it, need to dig deeper.

aldanor commented 2 years ago

Ok, interesting - it's adding that weird non-existent user filter that fails sometimes in some conditions

If I remove a line with Filter::user(...), it seems to pass. Wonder then, why doesn't it fail everywhere, hm.

aldanor commented 2 years ago

Ok, fixed yet another typo/bug in a573e72 (🤦, having proper tests is a good thing, we need more of them to cover all the new dataset/DCPL stuff...), let's see how this works out now.

aldanor commented 2 years ago

Ok, this should be good to go, I think. Hopefully fixes most filter problems.