Closed aldanor closed 2 years ago
This is WIP, but at least it's reproducible now.
As can be seen in the filter pipeline test, it's not just blosc/lzf filters, looks like something gets corrupted with the filter pipeline when applying it to the dataset.
@mulimoen Any thoughts where the corruption might happen? Hm..
E.g. for the first filter in the pipeline (only when it gets attached to the dataset), you get
filter_id=5, cdata=[8, 1, 200000, 1, 4, 0, 32, 0]
whereas for nbit filter (id=5) the cdata should be an empty array.
thread 'hl::filters::tests::test_filter_pipeline' panicked at 'called `Result::unwrap()` on an `Err` value: expected length 0 cdata for nbit filter',
I am not very familiar with the filter code, but I've had a look at the lzf
filter. In here the filter is modified in set_local_lzf
which increases the number of elements to three which clashes with the check in parse_lzf
which checks
ensure!(cdata.is_empty(), "expected length 0 cdata for lzf filter");
This check should probably be removed or check if 0 or 3?
And I do wonder if nbit
filter must also have a larger cdata
to store offset
and precision
. The shuffle filter must store the number of bytes for shuffling.
And I do wonder if
nbit
filter must also have a largercdata
to storeoffset
andprecision
. The shuffle filter must store the number of bytes for shuffling.
Yep, agreed, and that logic is pretty complicated per-filter (i.e. cdata size may depend on the data itself), and in some cases there's up to around 20 values being stored. I've relaxed the checks so that only the minimum length is verified in the cases where we need to pull the parameters.
Now the failure you see in the filters test is the one where blosc filter parameters are zeroed out.
That is:
// before applying to dataset (just storing in DCPL)
cdata = [
0, 0, 0, 0, 9, 1, 5,
]
// after applying to dataset and then extracting it back
cdata = [
2, 2, 4, 800000, 0, 0, 0,
]
Note that all internal parameters have been filled out now (first 4 values), but the user parameters (last 3) got erased somehow...
Was an actual bug - a single character mistake which essentially disabled all blosc filters 🤣 (hopefully fixed now in baf4ddb - and we'll have to release a version asap so that it actually works)
Interesting, now there's obscure conda-related failures (I think I've seen that years ago and hacked around it somehow, but need to remember what exactly to do here).
Looks like the failures are related to HDF5_PLUGIN_PATH
somehow
Any thoughts on how to reproduce/fix this? Hm...
(would be nice to merge this asap and release 0.8.1 since there is a serious bug fixed in this PR already)
Another thought that I had - don't hide Blosc
, BloscShuffle
structs etc under #[feature = "blosc"]
so that you could read filter pipeline and see it properly (and not as Filter::User
).
However, then we need to manually check that filters are available (both cfg!(feature = "blosc")
and blosc_available()
) when applying the filters to DCPL. Maybe it makes sense to do it regardless.
I tried reproducing in python here: https://github.com/mulimoen/hdf5_py_test/actions/runs/1473774852 but could not get the error to trigger. Maybe h5py
has a workaround?
Another thought that I had - don't hide Blosc, BloscShuffle structs etc under #[feature = "blosc"] so that you could read filter pipeline and see it properly (and not as Filter::User).
Would this not depend on BLOSC_FILTER_ID
being exactly 32001
? This might be a problem for all files created by other libraries. (This is also a breaking change)
Another thought that I had - don't hide Blosc, BloscShuffle structs etc under #[feature = "blosc"] so that you could read filter pipeline and see it properly (and not as Filter::User).
Would this not depend on
BLOSC_FILTER_ID
being exactly32001
? This might be a problem for all files created by other libraries. (This is also a breaking change)
Blosc filter ID being 32001 is a sort of immutable invariant that we have to accept, I guess... More details here: https://support.hdfgroup.org/pubs/rfcs/RFC_Filters-registration-policy-final.pdf
Ok, I was able to reproduce it, same as one of the failures in CI: macOS, using 1.8.18 from anaconda main channel.
Error: H5Pset_filter(): failed to call private function: can't open directory: .../envs/hdf5-1.8.18/lib/hdf5/plugin
Apparently, there's H5PL*
API for managing plugin paths etc, but it's only available on 1.10.1 and later. Before that it just looks at env vars which may be cached or not... I'll give it a try tomorrow to see if anything can be hacked around.
Hm, if setting HDF5_PLUGIN_PATH
manually to some obscure folder, it now yields
Error: H5Pset_filter(): failed to call private function: failed to load dynamically loaded plugin
So just setting the env var wouldn't solve it, need to dig deeper.
Ok, interesting - it's adding that weird non-existent user filter that fails sometimes in some conditions
If I remove a line with Filter::user(...)
, it seems to pass. Wonder then, why doesn't it fail everywhere, hm.
Ok, fixed yet another typo/bug in a573e72 (🤦, having proper tests is a good thing, we need more of them to cover all the new dataset/DCPL stuff...), let's see how this works out now.
Ok, this should be good to go, I think. Hopefully fixes most filter problems.
Closes #183