Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
47 stars 16 forks source link

fix: Read data length from data type #106

Closed Blackclaws closed 2 weeks ago

Blackclaws commented 2 weeks ago

For opaque types we read the data length from the type if read as a byte array

Fixes issue #105

Blackclaws commented 2 weeks ago

As discussed in the linked issue the underlying issue is that PureHDF writes opaque data in compact format, which is readable by other libraries but apparently PureHDF can't read opaque data in compact format unless explicitly given the length of the data.

This should have no other impacts but maybe there is a better place for this fix.

I've also added a round trip test to catch issues with round tripping opaque data in the future.

Apollo3zehn commented 2 weeks ago

Thank you :-) Probably I will be able to have a more detailed look on it on Thursday.

Blackclaws commented 2 weeks ago

This still does not work for Memory<byte> as the way that is constructed doesn't seem to work with anything but single bytes anyway. Not sure if that is tested anywhere right now but I've commented out the relevant test section and added a TODO.

Apollo3zehn commented 2 weeks ago

I had a look into the problem and found it to be a more general one and not only related to opaque datasets. I recently implemented the raw mode concept which allows the user to get raw data (byte[] or Memory<byte>) and I created the following unit tests for this feature:

But those tests always used the method overload which accepts a buffer. When using this overload, the memory dimensions are well defined and the problem does not occur. However the raw mode is also active without the use of that method overload but this code path was not covered by the tests.

I extended the tests to close that gap and replicated the code changes in NativeDataset to the HsdsDataset to keep them similar.

Since the problem is not only relevant for opaque datasets, I had to remove the explicit check for it. Some code has been added to also support the Memory<byte> case.

So I think everything is covered now and ready for merge. Thanks for your efforts! Tomorrow I will work on support for symbolic links and also on further performance improvements and release a new version probably in the evening.