NeuroJSON / easyh5

EasyH5 Toolbox - An easy-to-use HDF5 data interface (loadh5 and saveh5) for MATLAB
BSD 3-Clause "New" or "Revised" License
12 stars 10 forks source link

Array dimensions are transposed in MATLAB and in HDF5 file #10

Closed fangq closed 4 years ago

fangq commented 4 years ago

Similar to this issue in snirf_homer3 https://github.com/fNIRS/snirf_homer3/issues/4 easyh5's array dimensions are transposed inside matlab and on disk (in hdf5 datasets).

This can be reproduced by running this in matlab

data2hdf=reshape(1:(2*4*6),[2,4,6]);
size(data2hdf)
saveh5(data2hdf,'test.h5')

then in a terminal, start python (after sudo apt-get install python-h5py python-numpy), and run

import h5py
import numpy as np

dat=h5py.File('test.h5','r')
d1=np.array(dat.get('/data2hdf'));
d1.shape
    (6, 4, 2)

or run h5dump test.h5

fangq@mars:~/space/solar/Gitroot/Project/github/easyh5$ h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
   DATASET "data2hdf" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 6, 4, 2 ) / ( 6, 4, 2 ) }
      DATA {
      (0,0,0): 1, 2,
      (0,1,0): 3, 4,
      (0,2,0): 5, 6,
      ...
      (5,3,0): 47, 48
      }
   }
}
}

as we can see, the python dataset dimensions are transposed compared to matlab's array, as expected. This is known because of the row-major/column major differences, see

https://www.mathworks.com/matlabcentral/answers/308303-why-does-matlab-transpose-hdf5-data

as the SNIRF specification defines arrays in HDF5 dataset constructs, so, this is considered a non-compliance, and should be fixed.

Note that this is not an issue when JSNIRF is used because JSONLab/JSNIRFY toolboxes transpose arrays before saving to JData _ArrayData_ construct (also row-major) and convert back when decoding.

fangq commented 4 years ago

Issue is now fixed with the above patch. Both loadh5 and saveh5 now have a new option Transpose and by default it is set to 1. Therefore, when saving, matlab data is converted to c/hdf5 array, and when loading, c/hdf5 arrays are converted back to matlab. Arrays loaded in python now show consistent dimensions and values. see

data2hdf=reshape(1:(2*4*6),[2,4,6]);
size(data2hdf)
data2hdf(1,3,2)
data2hdf(19)
saveh5(data2hdf,'test.h5')
newdata=loadh5('test.h5');
size(newdata.data2hdf)
newdata.data2hdf(1,3,2)
newdata.data2hdf(19)

in Python, both the dimensions and indexing are consistent (python indices starts from 0 instead of 1)

>>> dat=h5py.File('test.h5','r')
>>> d1=np.array(dat.get('/data2hdf'));
>>> d1.shape
(2, 4, 6)
>>> d1[0,2,1]
13.0
>>> np.ravel(d1,order='F')[18]
19.0