Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
50 stars 18 forks source link

Having trouble when reading a dataset compressed with "bitshuffle" #118

Closed lee2430 closed 3 months ago

lee2430 commented 3 months ago

Hi, Apollo. Sorry to bother you as I don't know if this problem is caused by my code or because PureHDF currently does not support reading the "bitshuffle" compressed data. I've tried all the fillers in PureHDF to read an h5 file (which is generated by a third-party hardware), but got the error:

Unhandled exception. System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.Exception: Could not find filter 'bitshuffle; see https://github.com/kiyo-masui/bitshuffle

The file could be processed under python with h5py and hdf5plugin:

import h5py
import hdf5plugin
import numpy as np
hdf_n = h5py.File(filename, 'r')
d_arr = np.array(hdf_n['entry/data/data'])

Some meta info about my target dataset:

Path: /entry/data/data
Type: Integer (unsigned), 32-bit, little-endian
Shape: 1 x 1062 x 1028 = 1091736
Chunk shape: 1 x 1062 x 1028 = 1091736
Compression Filters:  32008 | bitshuffle; see https://github.com/kiyo-masui/bitshuffle

My C# code is as follows.

using PureHDF.Filters;
using PureHDF;

H5Filter.Register(new Blosc2Filter());
//H5Filter.Register(new BZip2SharpZipLibFilter());
//H5Filter.Register(new DeflateISALFilter());
//H5Filter.Register(new DeflateSharpZipLibFilter());
//H5Filter.Register(new LzfFilter());

var file = H5File.OpenRead("tt.h5");
var dataset = file.Dataset("/entry/data/data");
var arr = dataset.Read<UInt32[,,]>();

The h5 file is at: https://drive.google.com/file/d/1J4ukBE_pxlCfW8Nj1J0wGeYaVxQLiDLc/view?usp=sharing

And at last, I have to say that the HDF5 support is too poor for those who are not dedicated on related work. Your PureHDF absolutely saved many people like me. Thank you very much, no matter if my ploblem is solved or not : )

Apollo3zehn commented 3 months ago

Thanks for raising this issue. I had no use case yet for bitshuffle and had no need yet to support it. Thanks to your sample file I will be able to work on this. I will definitely have a look into this, maybe today evening or tomorrow. If the bitshuffle algorithm is simple or if there is some external C# library there will be a solution rather quick. But if that is not the case I will probably need a while to support this.

Apollo3zehn commented 3 months ago

I need to translate this file to C# and wrap the whole https://github.com/kiyo-masui/bitshuffle library into a Nuget package like I did previously for https://www.nuget.org/packages/Blosc2.PInvoke and several other libraries. So both parts should not be too difficult. Still I think it will take one or two weeks for a first test version.

lee2430 commented 3 months ago

Thank you for your efforts. Looking forward to it.😁

Apollo3zehn commented 3 months ago

I am about to release a new version with Bitshuffle support. I think I was able to decode the data and hope this image is what you would expect:

grafik

My unit tests are working and so I hope there are no bugs left. The new version should be available in ~ 30 minutes.

Apollo3zehn commented 3 months ago

Version v2.1.0 is now released to Nuget.

Apollo3zehn commented 3 months ago

Ah, I forgot to give you some example code :-)

https://github.com/Apollo3zehn/PureHDF/blob/ddb83dc9956697829dd5f92ab2dcad19e02ed7e5/tests/PureHDF.Tests/Filters/FilterTests.cs#L391-L396

lee2430 commented 3 months ago

Thank you so much, Apollo! I really appreciate it!