Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
50 stars 18 forks source link

H5Group.Read<T>() fails with "Filter pipeline failed" in 1.0.0-alpha.21 #9

Closed ccopsey closed 1 year ago

ccopsey commented 1 year ago

My aplogies if the following report is a little vague. I'm not sure exactly what info is needed to replicate the issue, but from my perpective all existing files that have been read without issue for some time, are now consistently failing and it isn't immediately clear to me why. It feels like perhaps the new async capability is leading to corrupt reading of byte streams, but I could be way off. I'm logging this info early before I roll up my sleeves and attempt to work out myself what is going on, in case it is obvious to anybody else and/or a fix can be more rapidly fothcoming.

Let me know what other info I can provide to help diagnose the problem. Thanks.

--

In 1.0.0-alpha.20 the following works fine. In 1.0.0-alpha.21 it fails.

var hdf = H5File.OpenRead(path);
var result = hdf.Dataset("key").Read<double>();

Exception:

System.Exception: 'Filter pipeline failed.'
Inner Exception: InvalidDataException: The archive entry was compressed using an unsupported compression method.

Stacktrace:

   at HDF5.NET.H5Filter.ExecutePipeline(List`1 pipeline, UInt32 filterMask, H5FilterFlags flags, Memory`1 filterBuffer, Memory`1 resultBuffer)
   at HDF5.NET.H5D_Chunk.<ReadChunkAsync>d__60`1.MoveNext()
   at HDF5.NET.H5D_Chunk.<ReadChunkAsync>d__59`1.MoveNext()
   at HDF5.NET.SimpleChunkCache.<GetChunkAsync>d__15.MoveNext()
   at HDF5.NET.SelectionUtils.<CopyMemoryAsync>d__2`1.MoveNext()
   at HDF5.NET.H5Dataset.<ReadAsync>d__50`2.MoveNext()
   at HDF5.NET.H5Dataset.Read[T](Selection fileSelection, Selection memorySelection, UInt64[] memoryDims, H5DatasetAccess datasetAccess)
   at ...
Apollo3zehn commented 1 year ago

It does not sound to be an async issue (although I need to rework it a little bit to make it thread-safe). Best for me would be if you could provide a sample file since all my tests are running fine.

ccopsey commented 1 year ago

HDFTest.zip

This zip contains 2 files. Both are mat files created in MATLAB. sine-good.mat contains a single array of 1201 items, sine-bad.mat contains the same but 1251 items. sine-good.mat works for me, sine-bad.mat does not.

Having now rolled my sleeves up, I can say that the following patch to H5File.cs gets sine-bad.mat working. I don't know what the implications of this are though.

@@ -65,7 +65,7 @@
                throw new Exception("This library only works on little endian systems.");

            var safeFileHandle = (stream as FileStream)?.SafeFileHandle;
--          var reader = new H5BinaryReader(stream, safeFileHandle);
++          var reader = new H5BinaryReader(stream);

            // superblock
            var stepSize = 512;

Edit: my test:

using HDF5.NET;

var hdf = H5File.OpenRead(@"sine-bad.mat");
var result = hdf.Dataset("sine").Read<double>();

Console.WriteLine(result.Length);
Apollo3zehn commented 1 year ago

Thank you, I will investigate it tomorrow morning. Maybe I forgot to correctly handle the Matlab header with that change. The header sits in front of the actual HDF5 file and I guess that the offset is not properly taken into account.

Apollo3zehn commented 1 year ago

I should work better now with 1.0.0-alpha.22. The reason one file worked and the other not is that at a certain threshold the native HDF5 lib decides to use a different storage layout (from compact to chunked) and the chunk code path did not properly respect the .mat header offset within the HDF5 file. My tests did not detect this because my test file also used the compact layout :-/ I have fixed this.

ccopsey commented 1 year ago

Thank you for your quick response, and thanks for maintaining an excellent library.

Apollo3zehn commented 1 year ago

This is a message I post to all recent issues: I have just renamed the project from HDF5.NET to PureHDF for my preparations of a soon to come beta release. Please note that the Nuget package name has also changed and can be found here now: https://www.nuget.org/packages/PureHDF.