NeuroJSON / easyh5

EasyH5 Toolbox - An easy-to-use HDF5 data interface (loadh5 and saveh5) for MATLAB
BSD 3-Clause "New" or "Revised" License
12 stars 10 forks source link

Complex numbers storage format #4

Closed matteosecli closed 4 years ago

matteosecli commented 4 years ago

Hello,

first of all thank you a lot for this fantastic library, it's awesome! πŸ˜ƒ πŸŽ‰

I'm using it in conjunction with Armadillo, a C++ library for linear algebra. Armadillo can export to HDF5, and the idea is then loading the exported data into MATLAB.

I work with lots of complex matrices, that Armadillo stores exactly the way you do, the only difference being that 'Real' and 'Imag' are lowercase in Armadillo's storing convention (see e.g. this thread on Matlab Answers). This prevents EasyH5 from automatically importing the data as complex data; it's not a huge issue, since you can then manipulate the imported struct and merge real and imaginary parts, but it would be nice to be able to do it automatically. Plus, if you want to export data the other way around, i.e. from MATLAB to Armadillo, I see no easy way to do it.

Of course for my purposes I could replace all the 'Real' and 'Imag' strings in your code with their lowercase version, but I'd like to keep it general enough.

So, in order to solve the two problems at once and possibly improving the cooperation with other software with a different convention (e.g. 're' and 'im'), I kindly ask you if it's possible to implement a sort of "complex number format specification" option for both the saveh5 and the loadh5 functions, instead of hardcoding 'Real' and 'Imag'. Something on the lines:

saveh5(myComplexMatrix,'myTestFile.h5','complexformat',['real','imag']);

and

myComplexMatrix = loadh5('myTestFile.h5','complexformat',['real','imag']);

which then gives the possibility to customise the complex number storage format (e.g. complexformat',['re','im'] or whatever one prefers). The default with no options would be the current format, i.e. equivalent to 'complexformat',['Real','Imag'].

Thanks a lot in advance! πŸ˜„

fangq commented 4 years ago

@matteosecli, feel free to submit a patch to enable this format, and save to lower case real/imag use this option (and default to the Real/Imag).

However, all hdf5 libraries handle this differently (including the r/i format used in h5load.m function that the loadh5.m was adapted from), because there is no standardized way in HDF5 for complex numbers, like I mentioned in https://github.com/bids-standard/bids-specification/issues/197#issuecomment-541107503

My plan (and has been already implemented in the nightly build of easyh5) to this problem is to adopt the JData Specification (http://github.com/fangq/jdata , also at http://openjdata.org) that I have developed to encode advanced data structures, such as complex-valued arrays (see this wiki page). The hope of the JData spec is to define a standardized way to represent scientific data between different programming languages and software. You may read more on the rationales here

http://openjdata.org/wiki/index.cgi?JData/Basics

Using JData data annotation format to encode a data structure before saving and automatically decoding after loading has been implemented in the master branch, you may use this feature by adding 'jdata', 1 to the option list, see https://github.com/fangq/easyh5/commit/1d45bc3d7f3fc3c1627089e43bb10b949589dd60

matteosecli commented 4 years ago

@fangq I indeed agree that adopting a clear standard (the JData specification, in this case) would be beneficial to everyone in the longer term. I could maybe explore this possibility (i.e. the adoption of a JData format) with the developers of Armadillo as well, but in the meantime I'll just create a quick PR with my suggestions as a temporary workaround.