Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
47 stars 16 forks source link

How to write datatable to dataset ? #58

Closed chuongmep closed 5 months ago

chuongmep commented 5 months ago

Hi @Apollo3zehn ,

Thank you for your effort, this open source is awsome, Can I know if it posible allow write Datatable to dataset ?

some thing like this :

 [Test]
    public void TestSaveHdf()
    {
        DataTable data = new DataTable();
        data.Columns.Add("Name", typeof(string));
        data.Columns.Add("Age", typeof(int));
        data.Columns.Add("Location", typeof(string));
        data.Rows.Add("John", 30, "New York");
        var file = new H5File()
        {
            ["Walls"] = new H5Group()
            {
                ["metadata"] = data,
                ["string-dataset"] = new string[] { "One", "Two", "Three" },
            }
        };
        file.Write("test.h5");
    }
Apollo3zehn commented 5 months ago

PureHDF accepts arbitrary types and then serializes the properties and/or fields of that type to the file. I never worked with the DataTable type but it is a good idea to add support for it (in future). Support means to serialize the actual table content and not just the properties of that type. There seems to be a circular reference which causes a stack overflow.

As a workaround you could do it like this:

using PureHDF;

var metadata = new Row[]
{
    new Row("John", 30, "New York"),
    new Row("Maria", 28, "Colorado")
};

var file = new H5File()
{
    ["Walls"] = new H5Group()
    {
        ["metadata"] = metadata,
        ["string-dataset"] = new string[] { "One", "Two", "Three" },
    }
};
file.Write("/home/vincent/Downloads/abc/test.h5");

record Row(string Name, int Age, string Location);

grafik

chuongmep commented 5 months ago

@Apollo3zehn , can I know the library you are using to build row ?

Apollo3zehn commented 5 months ago

It is a simple record definition (C# 9 and C#10 feature):

record struct Row(string Name, int Age, string Location);

If you are stuck on an older C# version, you can alternatively define a struct:

struct Row
{
    public string Name;
    public int Age;
    public string Location;
}

Or you define a class with properties but performance is best with struct or record struct.

chuongmep commented 5 months ago

@Apollo3zehn I test with your code but have some issue : System.Reflection.TargetInvocationException : Exception has been thrown by the target of an invocation. ----> System.Exception : The compound data type needs at least one member at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters) at PureHDF.H5NativeWriter.EncodeDataset(Object dataset) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 162 at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 106 at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 88 at PureHDF.H5NativeWriter.Write() in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 52 at PureHDF.H5File.Write(String filePath, H5WriteOptions options) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/API.Writing/H5File.cs:line 15 at HDF5UnitTest.Tests.TestSaveHdf() in C:\Users\chuongho\Downloads\github\ExploreHDF5\HDF5UnitTest\ExportDataTest.cs:line 36 at System.RuntimeMethodHandle.InvokeMethod(Object target, Void* arguments, Signature sig, Boolean isConstructor) at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr args, BindingFlags invokeAttr) --Exception at PureHDF.VOL.Native.DatatypeMessage.GetTypeInfoForReferenceLikeType(NativeWriteContext context, Type type) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 368 at PureHDF.VOL.Native.DatatypeMessage.GetTypeInfoForScalar(NativeWriteContext context, Type type, Int32 stringLength) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 135 at PureHDF.VOL.Native.DatatypeMessage.GetTypeInfoForTopLevelMemory[T](NativeWriteContext context) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 915 at PureHDF.VOL.Native.DatatypeMessage.Create[T](NativeWriteContext context, Memory`1 topLevelData, Boolean isScalar) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 44 at PureHDF.H5NativeWriter.InternalEncodeDataset[T,TElement](H5Dataset dataset, T data, Boolean isScalar) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 173 at System.RuntimeMethodHandle.InvokeMethod(Object target, Void* arguments, Signature sig, Boolean isConstructor) at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr args, BindingFlags invokeAttr)


One or more child tests had errors Exception doesn't have a stacktrace

 [Test]
    public void TestSaveHdf()
    {
        var metadata = new Row[]
        {
            new Row("John", 30, "New York"),
            new Row("Maria", 28, "Colorado")
        };

        var file = new H5File()
        {
            ["Walls"] = new H5Group()
            {
                ["metadata"] = metadata,
                ["string-dataset"] = new string[] { "One", "Two", "Three" },
            }
        };
        file.Write("test.h5");
    }
Apollo3zehn commented 5 months ago

How exactly did you define Row in your setup? By default classes must use properties and structs must use fields. This can be changed if you need classes with fields or structs with properties.

chuongmep commented 5 months ago

@Apollo3zehn I follow your example, any thing I need to change ?

record struct Row(string Name, int Age, string Location);
Apollo3zehn commented 5 months ago

Oh yes, my bad. This way you get a struct with properties. I will send you an updated example in a few minutes.

chuongmep commented 5 months ago

Thank you ! I'm just follow you code 🤧

Apollo3zehn commented 5 months ago
  1. For best performance use:
struct Row
{
    public string Name;
    public int Age;
    public string Location;
}

h5file.Write("test.py");
  1. For simplicity (a little bit slower) use record class (or just record):
record Row(string Name, int Age, string Location);
h5file.Write("test.py");
  1. If you need to use record struct, do the following:
record struct Row(string Name, int Age, string Location);

h5file.Write("test.py", new H5WriteOptions
{
    IncludeStructProperties = true
});

Choose whatever you like :-)

chuongmep commented 5 months ago

Thank you, it working now !

public struct Row
{
    public string Name;
    public int Value;
    public string Unit;

    public Row(string name, int value, string unit)
    {
        Name = name;
        Value = value;
        Unit = unit;
    }
}
[Test]
    public void TestSaveHdf()
    {
        Row[] metadata = new Row[]
        {
            new Row("Length", 100, "mm"),
            new Row("Volumn", 2, "m^3")
        };

        var file = new H5File()
        {
            ["Walls"] = new H5Group()
            {
                ["metadata"] = metadata,
                ["string-dataset"] = new string[] { "One", "Two", "Three" },
            }
        };
        file.Write("test.h5");
    }
chuongmep commented 5 months ago

@Apollo3zehn , I can't create with number row more than 100, do you know why ? I just can write max is 85 row.

 [Test]
    public void TestSaveHdf()
    {
        // Row[] metadata = new Row[]
        // {
        //     new Row("Length", 100, "mm"),
        //     new Row("Volumn", 2, "m^3")
        // };
        // create dummy one million row
        Row[] metadata = new Row[100];
        for (int i = 0; i < metadata.Length; i++)
        {
            metadata[i] = new Row("Length", i, "mm");
        }

        var file = new H5File()
        {
            ["Walls"] = new H5Group()
            {
                ["metadata"] = metadata,
                ["location"] = new List<double> {200,300,500},
                ["size"] = metadata.Length,
            }
        };
        file.Write("test.h5");
    }

output : Specified argument was out of the range of valid values. at System.RuntimeMethodHandle.InvokeMethod(Object target, Span1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters) at PureHDF.H5NativeWriter.EncodeDataset(Object dataset) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 162 at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 61 at PureHDF.H5NativeWriter.EncodeGroup(H5Group group) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 88 at PureHDF.H5NativeWriter.Write() in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 52 at PureHDF.H5File.Write(String filePath, H5WriteOptions options) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/API.Writing/H5File.cs:line 15 at HDF5UnitTest.Tests.TestSaveHdf() in D:\API\BigData\ExploreHDF5\HDF5UnitTest\ExportDataTest.cs:line 43 --ArgumentOutOfRangeException at PureHDF.VOL.Native.GlobalHeapManager.AddObject(Int32 size) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/GlobalHeapManager.cs:line 54 at PureHDF.VOL.Native.DatatypeMessage.<>c__DisplayClass55_0.<GetTypeInfoForVariableLengthString>g__encode|0(Object source, IH5WriteStream target) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 574 at PureHDF.VOL.Native.DatatypeMessage.<>c__DisplayClass53_0.<GetTypeInfoForReferenceLikeType>g__encode|1(Object source, IH5WriteStream target) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 438 at PureHDF.VOL.Native.DatatypeMessage.<>c__DisplayClass63_01.g__encode|0(Memory1 source, IH5WriteStream target) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/FileFormat/Level2/ObjectHeaderMessages/Datatype/DatatypeMessage.Writing.cs:line 923 at PureHDF.Selections.SelectionHelper.EncodeStream[TResult](IEnumerator1 sourceWalker, IEnumerator1 targetWalker, EncodeInfo1 encodeInfo) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/Selections/SelectionHelper.cs:line 124 at PureHDF.Selections.SelectionHelper.Encode[TSource](Int32 sourceRank, Int32 targetRank, EncodeInfo1 encodeInfo) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/Selections/SelectionHelper.cs:line 115 at PureHDF.H5NativeWriter.WriteData[TElement](H5D_Base h5d, EncodeDelegate1 encode, Memory`1 memoryData, Selection fileSelection, Selection memorySelection, UInt64[] memoryDims) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 437 at PureHDF.H5NativeWriter.InternalEncodeDataset[T,TElement](H5Dataset dataset, T data, Boolean isScalar) in /home/runner/work/PureHDF/PureHDF/src/PureHDF/VOL/Native/Core.Writing/H5NativeWriter.cs:line 318

Apollo3zehn commented 5 months ago

I found a small bug, I`ll publish a new version soon and come back to you then.

Apollo3zehn commented 5 months ago

v1.0.0-beta.4 should fix that problem: https://github.com/Apollo3zehn/PureHDF/releases/tag/v1.0.0-beta.4

Apollo3zehn commented 5 months ago

And here: https://www.nuget.org/packages/PureHDF/1.0.0-beta.4

chuongmep commented 5 months ago

Thank you, it work well now @Apollo3zehn , just small problem is my data under datatable.

veltrupdev commented 5 months ago

Great :-)

What do you mean with just small problem is my data under datatable?

Apollo3zehn commented 5 months ago

Sorry .. I used the wrong account in the previous message

chuongmep commented 5 months ago

Yes, for data row more than 1000 row it working fine now.

chuongmep commented 5 months ago

My data now is under Mutiple datatable and I want save it with mutiple group in hdf5. The reason I'm use datatable because column name and data or each table is difference.

Apollo3zehn commented 5 months ago

You can dynamically define the name of the compound member in the HDF 5 file by creating a FieldNameMapper as shown below. This mapper gets a FieldInfo instance. The return value is the compound member name to use in the HDF5 file. The FieldNameMapper is invoked for every dataset being written into the file individually. Maybe this helps with your problem.

var options = new H5WriteOptions(
    FieldNameMapper: fieldInfo => "my-hdf5-compound-member-name"
);

file.Write(filePath, options);
chuongmep commented 5 months ago

@Apollo3zehn , thank you for your help, it really hard with me now, if support object[][] or datatable also can resolve my problem, your recommend I'm don't know how to start, but I will research on that. Thank you so much.