Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
47 stars 16 forks source link

Cannot read opaque datasets (created with PureHDF) #105

Closed Blackclaws closed 1 week ago

Blackclaws commented 2 weeks ago

I've tried roundtripping opaque datasets created with PureHDF:

using PureHDF;

var reproducibleProblem = new H5File();

var data = new byte[] { 0x01, 0x02, 0x13 };
reproducibleProblem["test"] = new H5Dataset(data, opaqueInfo: new H5OpaqueInfo((uint) data.Length, "New"));

reproducibleProblem.Write("repro.h5");

var open = H5File.OpenRead("repro.h5");
var dataset = open.Dataset("test");

var dataRead = dataset.Read<byte[]>();

Unfortunately an exception is thrown:

--> System.Exception: The total file selection element count does not match the total memory selection element count.
Blackclaws commented 2 weeks ago

Adding:

var dataRead = dataset.Read<byte[]>( memoryDims: [(ulong) data.Length]);

Allows the data to be read, however I do not know the size beforehand in all cases.

Blackclaws commented 2 weeks ago

I've done some digging, because I wondered why the tests were actually passing on this. And it appears that you're writing Opaque data slightly different than how H5 itself does it (when used as you do it in your test data), leading to us not being able to round trip: Test Data:

HDF5 "/tmp/tmpcoT3x9.tmp" {
GROUP "/" {
   GROUP "opaque" {
      DATASET "opaque" {
         DATATYPE  H5T_OPAQUE {
            OPAQUE_TAG "Opaque Test Tag";
         }
         DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
         DATA {
         (0): 0x01, 0x02, 0x13, 0x37
         }
      }
   }
}
}

PureHDF Data:

❯ h5dump ../HDF5Playground/bin/Debug/net8.0/repro.h5 
HDF5 "../HDF5Playground/bin/Debug/net8.0/repro.h5" {
GROUP "/" {
   DATASET "test" {
      DATATYPE  H5T_OPAQUE {
         OPAQUE_TAG "New";
      }
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
      DATA {
      (0): 01:02:13
      }
   }
}
}

This change in how the data is being saved might make the difference.

Difference seems to be Storage_Layout Contiguous vs. Compact

❯ h5dump -d test -p ../HDF5Playground/bin/Debug/net8.0/repro.h5
HDF5 "../HDF5Playground/bin/Debug/net8.0/repro.h5" {
DATASET "test" {
   DATATYPE  H5T_OPAQUE {
      OPAQUE_TAG "New";
   }
   DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   STORAGE_LAYOUT {
      COMPACT
      SIZE 3
   }
   FILTERS {
      NONE
   }
   FILLVALUE {
      FILL_TIME H5D_FILL_TIME_NEVER
      VALUE  H5D_FILL_VALUE_UNDEFINED
   }
   ALLOCATION_TIME {
      H5D_ALLOC_TIME_EARLY
   }
   DATA {
   (0): 01:02:13
   }
}
}

❯ h5dump -d opaque/opaque -p /tmp/tmpcoT3x9.tmp
HDF5 "/tmp/tmpcoT3x9.tmp" {
DATASET "opaque/opaque" {
   DATATYPE  H5T_OPAQUE {
      OPAQUE_TAG "Opaque Test Tag";
   }
   DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
   STORAGE_LAYOUT {
      CONTIGUOUS
      SIZE 4
      OFFSET 2048
   }
   FILTERS {
      NONE
   }
   FILLVALUE {
      FILL_TIME H5D_FILL_TIME_IFSET
      VALUE  H5D_FILL_VALUE_DEFAULT
   }
   ALLOCATION_TIME {
      H5D_ALLOC_TIME_LATE
   }
   DATA {
   (0): 0x01, 0x02, 0x13, 0x37
   }
}
}
Apollo3zehn commented 1 week ago

The new version is released and hopefully it solves your problem :-)

Blackclaws commented 1 week ago

Seems to be solved. Thanks again for the quick turnaround and the work you put into this :) Its helped me a ton already!