Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.
MIT License
47 stars 16 forks source link

Adding writing support? #11

Closed ErroneousFatality closed 1 year ago

ErroneousFatality commented 1 year ago

Would this be feasible in the near future? I love this package, but now my project requires creation too instead of just reading of nwb and hdf5 files.

Apollo3zehn commented 1 year ago

I am actually planning to add simple write support because I need it as well. This would mean support for writing a new files and no support for editing an existing file as this is more complicated. Also this would mean that only the newest internal structures of HDF5 will be supported to simplify things.

But I cannot promise when it will be ready as I first need to finish read support. So please do not rely yet on this library for write support. I think https://github.com/LiorBanai/HDF5-CSharp would be the easiest way to write files in the meantime.

ErroneousFatality commented 1 year ago

I actually need it for that exact usecase! I have an API that parses hdf5 files into data using your library and now I want to add endpoints for getting back the data as a generated hdf5 file. Thanks for replying so fast and for the suggestion. I checked out the HDF5-CSharp package but it seems to only support .net 6 on windows, so I've accepted my fate and have started writing the implementation myself using https://github.com/HDFGroup/HDF.PInvoke.1.10 . But once you add the usecase to your library, I'll probably switch back to it, to keep the dependencies clean.

PhilPJL commented 1 year ago

@ErroneousFatality I've created another H5 library. It's enough for my requirements, but certainly less complete than HDF5-CSharp. You may find it useful.

Apollo3zehn commented 1 year ago

I think https://github.com/LiorBanai/HDF5-CSharp works cross platform because it depends only on this library and HDF.PInvoke, which both are running on Linux, Mac and Windows.

@PhilPJL, thanks for pointing to another alternative. I would be interested to know more about the differences to HDF5-CSharp. Thanks :-)

PhilPJL commented 1 year ago

It was something I created to satisfy the specific requirements I had, which was exporting database data as custom types to H5 data-sets. There isn't currently an example of that usage at the moment in the repo. It would be used something like:

public sealed class IntervalRecordAdapter : H5TypeAdapter<IntervalRecord, IntervalRecordAdapter.SIntervalRecord>
{
    private IntervalRecordAdapter() { }

    protected override SIntervalRecord Convert(IntervalRecord source)
    {
        return new SIntervalRecord
        {
            Id = source.Id,
            Timestamp = source.Timestamp.ToOADate(),
            AverageThickness = source.AverageThickness ?? double.NaN,
            MinimumThickness = source.MinimumThickness ?? double.NaN,
            MaximumThickness = source.MaximumThickness ?? double.NaN
        };
    }

    public override H5Type GetH5Type()
    {
        return H5Type
            .CreateCompoundType<SIntervalRecord>()
            .Insert<SIntervalRecord>(nameof(SIntervalRecord.Id), H5T.NATIVE_INT64)
            .Insert<SIntervalRecord>(nameof(SIntervalRecord.Timestamp), H5T.NATIVE_DOUBLE)
            .Insert<SIntervalRecord>(nameof(SIntervalRecord.AverageThickness), H5T.NATIVE_DOUBLE)
            .Insert<SIntervalRecord>(nameof(SIntervalRecord.MinimumThickness), H5T.NATIVE_DOUBLE)
            .Insert<SIntervalRecord>(nameof(SIntervalRecord.MaximumThickness), H5T.NATIVE_DOUBLE);
    }

    [StructLayout(LayoutKind.Sequential)]
    public struct SIntervalRecord
    {
        public long Id;
        public double Timestamp;
        public double AverageThickness;
        public double MinimumThickness;
        public double MaximumThickness;
    }

    public static IH5TypeAdapter<IntervalRecord> Default { get; } = new IntervalRecordAdapter();
}

    public void CreateIntervalRecordsDataSet(IH5Location location)
    {
        //////////////////////////////////////
        // Interval records
        using var altContext = new TvlAltContext(ConnectionStringAlt);
        using var scope = GetTransactionScope();
        using var intervalRecordWriter = H5DataSetWriter
            .CreateOneDimensionalDataSetWriter(location, "IntervalRecords", IntervalRecordAdapter.Default, ChunkSize,
                CompressionLevel);
        NotifyProgress(CtsExportStage.IntervalRecords, 0, 0);

        int numIntervalRecords = altContext
            .IntervalRecords
            .AsNoTracking()
            .Where(r => r.RawRecords.Any(rr => rr.MeasurementId == MeasurementId))
            .Where(r => r.Timestamp >= StartDateTime)
            .Where(r => r.Timestamp <= EndDateTime)
            .Count();

        NotifyProgress(CtsExportStage.IntervalRecords, 0, numIntervalRecords);

        altContext
            .IntervalRecords
            .AsNoTracking()
            .Where(r => r.RawRecords.Any(rr => rr.MeasurementId == MeasurementId))
            .Where(r => r.Timestamp >= StartDateTime)
            .Where(r => r.Timestamp <= EndDateTime)
            .OrderBy(r => r.Id)
            .Buffer(ChunkSize)
            .ForEach(rg =>
            {
                intervalRecordWriter.Write(rg);

                NotifyProgress(CtsExportStage.IntervalRecords, intervalRecordWriter.RowsWritten, numIntervalRecords);
            });

        scope.Complete();
    }

I guess the main difference is that I wrap all H5 handles (and memory handling) into disposable objects so there's no 'try/catch/finally', you use the dispose pattern.

ErroneousFatality commented 1 year ago

I think https://github.com/LiorBanai/HDF5-CSharp works cross platform because it depends only on this library and HDF.PInvoke, which both are running on Linux, Mac and Windows.

I thought the same but then this information on its nuget package page confused me: image

I'll trust you and try it out if it works when included in .net 6 class libraries running on linux.

Apollo3zehn commented 1 year ago

@PhilPJL thanks for the introduction. Dispose pattern is definetly an improvement over manually closing H5 handles.

@ErroneousFatality I understand it is confusing but your screenshot also shows netstandard 2.0 so the managed code runs everywhere and the native code runs on Linux, Mac and Windows because HDF.PInvoke.1.10 contains the corresponding native files. But in the end choose whatever lib suits best to you :-)

@LiorBanai do you know why there is a net6.0-windows7.0 dependency in your Nuget package? In your repo I see only net6.0 (no windows part).

LiorBanai commented 1 year ago

Hi @ErroneousFatality and @Apollo3zehn , the latest version that I released was 1.15.3 which was indeed windows only: <TargetFrameworks>net6.0-windows;netstandard2.0</TargetFrameworks>

I changed it to <TargetFrameworks>netstandard20;netstandard21;net60</TargetFrameworks> some days ago but haven't release a nuget for it.

I just now did it. You can used it here: https://www.nuget.org/packages/HDF5-CSharp/1.15.4

but to tell you the truth, I have zero experience consuming nuget packages in linux systems so I don't know how it will behave. aren't linux files are .so extension so will a .dll even work?

I think I made it windows only since never tested it on linux system so cannot guarantee it will work :)

LiorBanai commented 1 year ago

Also, I had few memory leaks that I fixed in the latest releases due to handles not being close so @PhilPJL dispose pattern seems very useful so I think I'll add it in the future (have limited time right now).

Apollo3zehn commented 1 year ago

Thanks for the explanation! .NET DLLs will work on Linux when they target .NET Standard. Native libs must be .so files.

.NET on Linux uses a different bootstrapping mechanism where Linux specific libs are involved but the rest of the code is the same on all platforms.

I am pretty sure your code works on Linux. If not, there is a bug in HDF.PInvoke.1.10 which we should solve then.

PhilPJL commented 1 year ago

Hmmm, for some reason I didn't add a link to my repo https://github.com/PhilPJL/HDF5Test

LiorBanai commented 1 year ago

@Apollo3zehn when your write support is ready please update me. I would like to add to my readme a link to your library so users will have more options/libraries to choose from (not only mine).

Apollo3zehn commented 1 year ago

@PhilPJL no problem, I have found it in your repository list.

@LiorBanai Thanks, I will do so, but it will be while until there is write support.

LiorBanai commented 1 year ago

@Apollo3zehn no problem. it is for the future anyway :)

LiorBanai commented 1 year ago

@ErroneousFatality and @Apollo3zehn I did small test and it seems to work on Linux: image

and the file: image

Apollo3zehn commented 1 year ago

Thanks @LiorBanai , that is good to know! It seems the HDF.PInvoke.1.10 package is working as it should.