NeuroJSON / easyh5

EasyH5 Toolbox - An easy-to-use HDF5 data interface (loadh5 and saveh5) for MATLAB
BSD 3-Clause "New" or "Revised" License
12 stars 10 forks source link

Advantage over Matlab HDF5 support? #7

Closed ftadel closed 4 years ago

ftadel commented 4 years ago

Hello Since this is working only with Matlab and not Octave, what is the advantage of this library over using the readily available HDF5 functions in Matlab? https://www.mathworks.com/help/matlab/hdf5-files.html

I'm sorry if this is a silly question, I haven't looked at your code or examples at all... I was just wondering about the use of this code in jsnirf, and questioning the necessity of many additional dependencies to Brainstorm if we want to reuse some of you code. Related with this PR: https://github.com/brainstorm-tools/brainstorm3/pull/283

Thanks!

fangq commented 4 years ago

@ftadel, I was about to start a conversation with your team on file formats, glad you you mentioned this.

what is the advantage of this library over using the readily available HDF5 functions in Matlab?

the main appeal of easyh5 is its simplicity and usability. it is super compact and extremely easy to use.

the corresponding functions to loadh5/saveh5 in matlab's high-level H5 interface are h5read/h5write, but the latter have a lot of limitations.

For starters, they can't directly save a struct, a struct array, or a cell array. It only accepts numerical or string arrays as input. They are also unable to directly handle other advanced data structures, such as tables, graph, containers.Map etc, unless you use the low-level interfaces and serialize those manually.

In comparison, easyh5 accepts literally all matlab data structures, serializing those and storing in a single hdf5 file. It also reads any hdf5 file and obtain a convenient struct/containers.Map object to store the complex hierarchical data.

Secondly, even one calls low-level interface, without special settings, matlab's functions do not save data fields in their creation order. This can be quite annoying as some data records requires an appearance order. This was fixed in easyh5 https://github.com/fangq/easyh5/issues/1

Last but not the least, hdf5 is a general data storage format, like JSON and MessagePack, and it does not have internal data-structure vocabularies to serialize complex data structures. People will have to serialize those by themselves, and use non-standardized fields. For example, hdf5 does not have built-in complex numbers, people have been storing those in composite fields as r/i, Re/Im or Real/Imag etc. This makes it difficult to share and parse.

My solution to the above issue is to adopt the JData specification I defined for JSON-based data storage, aiming for standardizing the vocabularies of serializing complex data structures.

you can see my other related thoughts related to hdf5 in this BIDS thread:

https://github.com/bids-standard/bids-specification/issues/197#issuecomment-541107503

ftadel commented 4 years ago

This is all very clear, thanks!

Which types of data does the SNIRF data format need to store in HDF5 that the Matlab functions can't store easily? Only cell-arrays, or more than this?

fangq commented 4 years ago

The SNIRF data is basically a struct if mapped to a matlab data structure.

Unfortunately, matlab's h5read/h5write can only deal with numerical arrays. To support a hierarchical data, you will have to loop over subfields and call the low-level functions, which in the end is exactly what easyh5's loadh5/saveh5 does.

take a look at the loadh5/saveh5 code and you can see they are super compact (yet very general).