HDFGroup / HDF.PInvoke

Raw HDF5 Power for .NET
http://www.hdfgroup.org/HDF5
Other
80 stars 29 forks source link

Can't read compound dataset with struct fields Sbyte and Float #166

Closed acrignola closed 4 years ago

acrignola commented 4 years ago

Hello the HDF group community,

I developped my wrapper HDF5 with HDF PInvoke 1.5.2 and I have an issue while I try to read dataset with compound datatype. The struct I used is composed of SByte, Float. There is no exception, but the values read are wrong, only the first value of Sbyte is read correctly, then all the other values are wrong.

For example, I want to read dataset with compound datatype as below: ID(in SByte) X (in Float) 1 -1.3 2 180

And the result when I read this dataset is: 1 9.851128E-43 52 0

I am able to read a dataset with compound datatype with struct Float X, Float Y. And I am also able to read a dataset with compound datatype with struct Sbyte ID, Sbyte IDCopy. Anyone have any idea how can I solve my issue please ?

Please find below some part of my code: " long ID = H5D.open(File.ID, FullName); Type type = typeof(T); Array datasetOut = Array.CreateInstance(type, (int[])Dataspace.Dimensions); long datatype = H5D.get_type(ID); GCHandle hnd = GCHandle.Alloc(datasetOut, GCHandleType.Pinned); H5D.read(ID, datatype, H5S.ALL, H5S.ALL, H5P.DEFAULT, hnd.AddrOfPinnedObject()) " with 'T' my struct: " public struct CompDatasetSbyteFloat { SByte ID; float X; }

    public struct CompDatasetSbyteSbyte
    {
        SByte ID;
        SByte IDcopy;
    }

    public struct CompDatasetFloatFloat
    {            
        float X;
        float Y;
    }

"

Apollo3zehn commented 4 years ago

Did you check if the data are stored correctly, e.g. with the HDFView (https://www.hdfgroup.org/downloads/hdfview/)?

Here is how I read and write struct arrays (using .NET Core 3):

Write

prepare struct for write: https://github.com/OneDAS-Group/OneDAS-Core/blob/master/src/OneDas.Hdf.Types/IO/IOHelper.cs#L502-L525

The resulting pointer (valueSetPointer) is then used later to write the data to the HDF file:

H5D.write(datasetId, typeId, H5S.ALL, H5S.ALL, H5P.DEFAULT, valueSetPointer);

Read

https://github.com/OneDAS-Group/OneDAS-Core/blob/master/src/OneDas.Hdf.Types/IO/IOHelper.cs#L377-L392

And the pointer bufferPointer which is used to construct the struct elements has been filled with raw HDF data before with a call to:

H5D.read(datasetId, typeId, dataspaceId_memory, dataspaceId_file, H5P.DEFAULT, bufferPtr)

Struct

Here is an example struct (unfortunately only with string arguments): https://github.com/OneDAS-Group/OneDAS-Core/blob/master/src/OneDas.Hdf.Types/IO/HdfStructure/hdf_transfer_function_t.cs

acrignola commented 4 years ago

Thank you for your answer.

Yes I checked with HDFView 3.1.0. And after seeing the issue I posted, I would like to be sure that the compound datatype has no particular properties. So I created from HDFView 3 compounds datasets:

I'll take a look at the links you send me and I come back.

Apollo3zehn commented 4 years ago

I forgot to link to the compount type creation code: https://github.com/OneDAS-Group/OneDAS-Core/blob/master/src/OneDas.Hdf.Types/Core/TypeConversionHelper.cs#L182-L214

But if you create the type via HDFView this should work as expected. So I can only imagine that your reading procedure has some issue.

acrignola commented 4 years ago

Hello Apollo3zehn,

I looked your reading procedure but I have some errors when I tried to read a compound dataset.

So I created a project with my reading procedure and your reading procedure (ApolloReadingProcedure) to share with you.

Please find in attached file the project HDF5CompDatasetReadTest.zip. The project is created with VS 2017 and .NET Framework 4.7. I used the HDF PInvoke library v 1.10.5.2. You will find:

In the HDF5CompDatasetReadTest project you will find my tests with my procedure:

And also my tests with your procedure:

Please also find in attached file the terminal_error_apollo_reading_procedure.txt. It contains the terminal error I have when I try to read with your reading procedure.

If you find my mistake, please let me know. Thank you again for your help.

Note: The project has to be used in DEBUG mode and the line 15 of the Program.cs must be defined depending on the location of the "test.h5" file.

HDF5CompDatasetReadTest.zip

terminal_error_apollo_reading_procedure.txt

Apollo3zehn commented 4 years ago

Thanks for you sample. The errors come because the HDF type is constructed wrongly. My routine does not find your struct's private fields. Please make your fields public or add the correct binding flags to GetFields() :

private long GetHdfTypeIdFromType(long fileId, Type type)
{
// ...
    foreach (FieldInfo fieldInfo in elementType.GetFields(BindingFlags.NonPublic | BindingFlags.Instance))
    {
    //
    }
...
}

With this I was able to read the data. I will have a more detailed look to it tomorrow.

Apollo3zehn commented 4 years ago

I think I found the reason for the differences between our approaches. HDF stores the compound data very compact and C# does not so necessarily:

grafik

You can see that the struct layout for CompDatasetSByteFloat has a total size of 5 bytes in HDF and 8 bytes in C#. The code to get both layouts is:

HDF:

var memberCount = H5T.get_nmembers(datatypeID);

Console.WriteLine();
Console.WriteLine("Total size: " + H5T.get_size(datatypeID));

for (uint i = 0; i < memberCount; i++)
{
    Console.WriteLine("  Name: " + Marshal.PtrToStringAnsi(H5T.get_member_name(datatypeID, i)));
    Console.WriteLine("    Offset: " + H5T.get_member_offset(datatypeID, i));
    var mtypeId = H5T.get_member_type(datatypeID, i);
    Console.WriteLine("    Size: " + H5T.get_size(mtypeId));
}

C

Console.WriteLine();
Console.WriteLine("Total size: " + Marshal.SizeOf(elementType));

foreach (FieldInfo fieldInfo in elementType.GetFields(BindingFlags.NonPublic | BindingFlags.Instance))
{
    Console.WriteLine("  Name: " + fieldInfo.Name);
    Console.WriteLine("    Offset: " + Marshal.OffsetOf(elementType, fieldInfo.Name));
    Console.WriteLine("    Size: " + Marshal.SizeOf(fieldInfo.FieldType));
}

The problem in your code is that you pass the HDF-file type ID to H5D.read() but this routine wants a mem_type_id, i.e. a type ID which describes your struct layout in memory, which is the C# struct layout.

You can now either modify your C# struct layout with attributes as shown here: https://docs.microsoft.com/de-de/dotnet/api/system.runtime.interopservices.structlayoutattribute?view=netframework-4.8

Or you create a mem_type_id which describes the C# layout as I did in the GetHdfTypeIdFromType method. Both ways should work fine :)

acrignola commented 4 years ago

Hello,

It works better now, thank you! And I understand my mistake on 'mem_type_id'. If I define 'mem_type_id' with the method 'GetHdfTypeIdFromType' and I set all my fields of my struct in public or if I add the correct binding flags to 'GetFields()', then I can read compound dataset with struct fields Sbyte and Float :)

I will have to do it without knowing the struct before but it's not for now. Thanks again and good afternoon.

acrignola commented 4 years ago

Hi,

I have some issues with an other hdf5 file... It's also a compound dataset with a compound datatype with Sbyte and Float (and float and float), but this time not created from HDFView. And if I created the same compound dataset with HDFView, the reading works.

The results of my reading procedure and your reading procedure is false even if I added the correction (set public members of struct or add the correct binding flags to in 'GetFieds()').

Please find in attached file the project HDF5CompDatasetReadTest_v2.zip updated. In the folder /HDF5CompDatasetReadTest/data there are:

And from line 112 to line 136 in program.cs the new test. There are no exceptions displayed on the terminal this time. Do you have an idea please ?

HDF5CompDatasetReadTest_v2.zip

Apollo3zehn commented 4 years ago

It took me a while to figure it out. I even compared hexdumps of both files but the solution was just too simple. The following line from GetHdfTypeIdFromType() adds a new field to the HDF type and this field has not just a type and an offset but also a name:

H5T.insert(typeId, fieldInfo.Name, Marshal.OffsetOf(elementType, fieldInfo.Name), fieldType);

This name is equal to the C# struct`s field name. If your struct fields are named differently than those within the HDF file, the code fails. If you do not want your struct fields to be named equally to the HDF ones, you could define an attribute that specifies the HDF name on your struct:

[AttributeUsage(AttributeTargets.Field)]
public class HdfNameAttribute : Attribute
{
    public string Name { get; set; }
}

public struct CCDPostion
{
    [HdfName(Name = "ccdID")]
    SByte CCDID;

    [HdfName(Name = "sd_offset_pp_x_array")]
    float SD_offset_pp_x_array;

    [HdfName(Name = "sd_offset_pp_y_array")]
    float SD_offset_pp_y_array;

    [HdfName(Name = "sd_orientation_array")]
    float SD_orientation_array;
}

And then change the H5T.insert code to:

var attribute = fieldInfo.GetCustomAttribute<HdfNameAttribute>(true);
var hdfFieldName = attribute != null ? attribute.Name : fieldInfo.Name;

H5T.insert(typeId, hdfFieldName, Marshal.OffsetOf(elementType, fieldInfo.Name), fieldType);

PS: I just figured out that your fields names are already nearly equal. The only difference is the capitalization, which caused all the problems.

acrignola commented 4 years ago

Thank you very much !

I have tried many things to solve it without success and I never thought about the capitalization. It works well now if I set my C# struct's field names equal with those in the HDF file.

Thank you also for the 2nd solution if I want to keep my C# struct's field names differently from those in the HDF file.