❓ File Extension - Githubissues

Apollo3zehn / PureHDF

A pure .NET library that makes reading and writing of HDF5 files (groups, datasets, attributes, ...) very easy.

MIT License

50 stars 18 forks source link

❓ File Extension #46

Closed ansible42 closed 11 months ago

ansible42 commented 11 months ago

Is it possible to do an Extension of a file such that the writer.Dispose() can be called and the data written to file and the continue to write to file using the chunking?

Apollo3zehn commented 11 months ago

I am nore sure what you mean, especially the part

and the data written to file and the continue to write to file using the chunking?

confuses me. Do you want to first call writer.Dispose() and then writer.Write(dataset, data, fileSelection: fileSelection);? That will probably not work.

ansible42 commented 11 months ago

Ah so, the situation that we have is we are continuously writing data to file for a data acquisition application. The chunk rate the data comes off the DAS should be between 5 and 10Hz, aquistion rate of 100kHz (x14 channels). Ideally, we would like to write to file so we can clear the local buffer. We would continue writing to file until the file reaches a specified size then rollover and create a new file.

This might not be the tool for us. We are still in the research stage for this project. Great library in any case thanks for the work.

ansible42 commented 11 months ago

OK no, I think I got it figured out.

    var writer = file.BeginWrite(filePath);
    for (var i = 0; i < numChunks; i++)
    {
        stopwatch.Restart();

        var selectionToWrite = new HyperslabSelection((ulong)(i * chunkSize), (ulong)chunkSize);
        foreach (var key in signals.Keys)
        {
            writer.Write(dataSetList[key], signals[key], selectionToWrite, selectionToWrite);
        }

        writer.Write(timeseriesDataset, timeSeries, selectionToWrite, selectionToWrite);

        chunkWriteTimes.Add(stopwatch.ElapsedMilliseconds);

    }

    writer.Dispose();

Apollo3zehn commented 11 months ago

Yes, this is how the API should be used. I hope it works well for you this way.

ansible42 commented 11 months ago

Vielen Dank

msft-takend commented 11 months ago

I never figured out a way to use this as an [extendable dataset(https://portal.hdfgroup.org/display/HDF5/Extendible+Datasets) for a file size of unknown length. Always seemed that in some way the H5Dataset always required a length and pre-allocated a file of that size. Is this correct or am I missing something?

Apollo3zehn commented 11 months ago

PureHDF writes any HDF5 data structure only once to the file. A consequence of this is that the file layout has to be known when PureHDF begins writing to disk and that means that extendible datasets are not supported as they require the chunk index to be modified after it has been written.

A reason for this limitation is that a design which allows modifying the chunk index and other structures afterwards requires the library to track and possibly reallocate free space within the file and this is something I do not yet want to support to due unknown complexity this might involve.