Issue: Scalar/String values always get written to the dataset as arrays of length 1

JakubOrlinski97 commented 11 months ago

It seems that when using for example WriteAsciiString, values that are inherently scalar (ie. a single string) get written to the HDF5 file as an array of length 1.

When using the python wrapper - h5py - such values are always written as scalars, so there is no issue when validating the file, but when I use HDF5-CSharp, an integer, double or a string always get saves as an array.

For me this is an issue, as I am working with the SNIRF data format for fNIRS data, and there they have a library pysnirf2 which can check the validity of an HDF5 file to make sure it conforms to the specification. This validation check fails because of the predefined error: 'INVALID_DATASET_SHAPE': 'An HDF5 Dataset is not stored in the specified shape. Strings and scalars should never be stored as arrays of length 1.'

Also, in general it seems that single values should be saved as scalars and not as single-element arrays.

Simple comparison: Python:

import h5py
f = h5py.File("example-python.h5", 'a')
f['formatVersion'] = "1.0"
f.close()

CSharp:

long _fileId = Hdf5.CreateFile("example-csharp.h5");
Hdf5.WriteAsciiString(_fileId, "formatVersion", "1.0");
Hdf5.CloseFile(_fileId);

JakubOrlinski97 commented 11 months ago

A related issue can be found in the way WriteAsciiString is implemented. There, we can see that at first a SCALAR space is opened, but then is not used when creating a dataset. Instead, a new simple space is opened (simple meaning an array I believe) and that gets used in the initialisation of the dataset.

I tried changing it to use the scalar space, and it does help! But changing it for the WriteObject method is a bit more challenging for me. There, what I imagine would be necessary is that the WriteOneValue method should call something different than WriteArray, since it is known that there is only one value as the name suggests. It seems that the only real change needed is the switch to SCALAR spaces, but I could be wrong!

I would appreciate help with this if possible! :)

JakubOrlinski97 commented 11 months ago

I decided to implement it myself, the code is mostly copied from WriteFromArray and the adjusted methods are WriteOneValue and WriteAsciiString. The changes can be found in this pull request (#337), please review and let me know if everything is in order!

LiorBanai commented 11 months ago

@JakubOrlinski97 Thanks for your contribution. I'll review it next week (busy with real life right now..)

LiorBanai commented 11 months ago

I think you meant to reference this issue in the pr and not 17 :)

JakubOrlinski97 commented 11 months ago

I think you meant to reference this issue in the pr and not 17 :)

Ahh you're right, I changed the name but I can't manually link it to this issue unfortunately :/ But I think you can as an Owner!

And no worries, for now I have a local patch for this. But do let me know when you've reviewed it, this is my first PR so for sure I expect something to be wrong :)

JakubOrlinski97 commented 11 months ago

I made some revisions in the code as I finally understood some of the calls that were being made to allocate space in memory for the variables and now the string writing should be actually correct :)

LiorBanai / HDF5-CSharp

Issue: Scalar/String values always get written to the dataset as arrays of length 1 #333