Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
47 stars 16 forks source link

Intermittent Write Failure #82

Closed Blackclaws closed 2 months ago

Blackclaws commented 2 months ago

So from time to time I get these errors:

--> System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
   at PureHDF.Selections.SelectionHelper.EncodeStream[TResult](IEnumerator`1 sourceWalker, IEnumerator`1 targetWalker, EncodeInfo`1 encodeInfo) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/Selections/SelectionHelper.cs:line 166
   at PureHDF.Selections.SelectionHelper.Encode[TSource](Int32 sourceRank, Int32 targetRank, EncodeInfo`1 encodeInfo) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/Selections/SelectionHelper.cs:line 115
   at PureHDF.H5NativeWriter.WriteData[TElement](H5D_Base h5d, EncodeDelegate`1 encode, Memory`1 memoryData, Selection fileSelection, Selection memorySelection, UInt64[] memoryDims) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 441
   at PureHDF.H5NativeWriter.InternalEncodeDataset[T,TElement](H5Dataset dataset, T data, Boolean isScalar) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 317
   at InvokeStub_H5NativeWriter.InternalEncodeDataset(Object, Span`1)
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   --- End of inner exception stack trace ---
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
   at PureHDF.H5NativeWriter.EncodeDataset(Object dataset) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 159
   at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 94
   at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 85
   at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 85
   at PureHDF.H5NativeWriter.Write() in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 52
   at PureHDF.H5File.Write(String filePath, H5WriteOptions options) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/API.Writing/H5File.cs:line 15

without any clear way on how to reproduce them. I'm not firm enough on the internals of PureHDF to even begin to debug this but maybe you have an idea what might be going wrong.

The resulting file that is written is then broken.

Apollo3zehn commented 2 months ago

If you cannot reproduce it, you could try to create a memory dump right when the exception occurs (dotnet-dump collect, https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump).

Is the problem related with opaque datasets? And if so, is there some varying parameter like the opaque type size or maybe the chunk layout which could produce this flaky behaviour?

Blackclaws commented 2 months ago

So the error seems to pop up erratically but I did manage to get it with a debugger this time and I think I know the source.

It appears the error happens when multiple opaque datasets are used in the same file and they are of different size:

var data = File.ReadAllBytes("/home/felix/Downloads/test.jpg");
var dataTwo = File.ReadAllBytes("/home/felix/dca.jpg");

Console.WriteLine(data.Length);
Console.WriteLine(dataTwo.Length);

var file = new H5File()
{
     ["opaque"] = new H5Dataset(data, opaqueInfo: new H5OpaqueInfo((uint) data.Length, "Test" )),
     ["opaque_two"] = new H5Dataset(dataTwo, opaqueInfo: new H5OpaqueInfo((uint) dataTwo.Length, "TestTwo" )),
};

file.Write("testing.h5");

The tag passed to H5OpaqueInfo does not matter (it also doesn't have to be same, but the same or different both show the issue).

Important to failure is only that the first opaque dataset is larger in size than the second one.

Interesting to note:

If the second is larger than the first it will just not get fully written. My guess is that there is again some sort of caching involved here.

Blackclaws commented 2 months ago

I've tracked it down to InternalEncodeDataset giving wrong dataset info back:

        var (datatype, encode) =
            DatatypeMessage.Create(Context, memoryData, isScalar, dataset.OpaqueInfo);
Blackclaws commented 2 months ago

Further tracked it down to this:

        // special case: opaque (= byte[])
        // use unique type to make cache happy
        if (type == typeof(byte) && opaqueInfo is not null)
            type = typeof(H5OpaqueInfo);

I think the problem is that it works fine for the first H5OpaqueInfo but because the size is different for each opaque set you can't cache it at all.

Blackclaws commented 2 months ago

I'm going to create a fix and pull request for it.

Apollo3zehn commented 2 months ago

Thanks for debugging!

Apollo3zehn commented 2 months ago

1.0.0-beta.15 includes the fix!