frejanordsiek / hdf5storage

Python package to read and write a wide range of Python types to/from HDF5 formatted files. Can read/write data to the HDF5 based Matlab v7.3 MAT files.
BSD 2-Clause "Simplified" License
82 stars 24 forks source link

savemat shape inconsistency for empty arrays #114

Closed allenleetc closed 3 years ago

allenleetc commented 3 years ago

Hi, when using savemat() we seem to be observing inconsistent shapes when empty arrays are saved. Any info appreciated. Thanks, Allen

# in py
import numpy as np
import hdf5storage
print(hdf5storage.__version__)

0.2

x = np.zeros((4,2,10))
y = np.zeros((4,2,0))
print(x.shape, y.shape)
dat = {'dat': [x,y]}
hdf5storage.savemat('/dat0/apt/test2.mat',dat)

(4, 2, 10) (4, 2, 0)

# in MATLAB
>> dat=load('/dat0/apt/test2.mat')
dat = 
  struct with fields:

    dat: {[4×2×10 double]  [0×2×4 double]}
frejanordsiek commented 3 years ago

Matlab uses Fortran dimension ordering as opposed to C dimension ordering (default in numpy, though numpy can use Fortran ordering). In order to make it so that numpy arrays are stored to and read from MAT v7.3 files in a way that rows are rows, columns are columns, etc. in both Python and Matlab, hdf5storage reverses the dimension order whenever Matlab compatibility is enabled (see Options.reverse_dimension_order).

In the v7.3 MAT format, when an array is empty, the HDF5 Dataset for it becomes a 1D uint64 array containing the array shape.

In hdf5storage, I have it such that the shape stored in it is gotten after the dimension order reversal. In this case, the shape (4, 2, 0) gets stored as [0, 2, 4].

But it seems that this is a mistake for Matlab compatibility. Either I thought that Matlab stored it in this order or I accidentally implemented the storage wrong. Regardless, this will need to be fixed since it is a bug.

Unfortunately, this cannot be changed in the 0.1.x branch since it would break compatibility in that data written by some versions of 0.1.x would get read by other versions with the reversed dimension order.

So, it can only be fixed in the main branch and poses a major compatibility issue between the two branches.

frejanordsiek commented 3 years ago

Fixed in the main branch in commit 2ef435249d24b26a2d12d8cac2bea3ca14a93edd