frejanordsiek / hdf5storage

Python package to read and write a wide range of Python types to/from HDF5 formatted files. Can read/write data to the HDF5 based Matlab v7.3 MAT files.
BSD 2-Clause "Simplified" License
83 stars 24 forks source link

Overwriting file fails #127

Open konradbender opened 1 year ago

konradbender commented 1 year ago

Hi there,

this is difficult to replicate as it might have to do with my specific file (which I can share, if necessary). But basically, when I do hdf5storage.savemat then it works on the first try, but fails on the second iteration when overwriting the file. Here is the code that would trigger this (note the local file samples.mat is used):

import hdf5storage
import numpy as np
import sys

print("hdf5 version", hdf5storage.__version__)
print("np version", np.__version__)
print(sys.version_info)

data = hdf5storage.loadmat('samples.mat')

hdf5storage.savemat('test-5.mat', data, format='7.3', store_python_metadata=True)
print('successfully saved matrix')
hdf5storage.savemat('test-5.mat', data, format='7.3', store_python_metadata=True)

Output:

hdf5 version 0.1.19
np version 1.25.2
sys.version_info(major=3, minor=11, micro=4, releaselevel='final', serial=0)
successfully saved matrix
Traceback (most recent call last):
  File "/Users/konrad/code/amci/src/amci/scratch.py", line 13, in <module>
    hdf5storage.savemat('test-5.mat', data, format='7.3', store_python_metadata=True)
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/hdf5storage/__init__.py", line 1681, in savemat
    writes(mdict=mdict, filename=file_name,
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/hdf5storage/__init__.py", line 1321, in writes
    lowlevel.write_data(f, grp, targetname, data,
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/hdf5storage/lowlevel.py", line 114, in write_data
    m.write(f, grp, name, data, type_string, options)
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/hdf5storage/Marshallers.py", line 1628, in write
    self.write_metadata(f, grp, name, data, type_string, options)
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/hdf5storage/Marshallers.py", line 1685, in write_metadata
    set_attribute(grp2, 'MATLAB_fields', fs)
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/hdf5storage/utilities.py", line 955, in set_attribute
    if not np.array_equal(value, read_matlab_fields_attribute(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/konrad/opt/anaconda3/envs/amci-new/lib/python3.11/site-packages/numpy/core/numeric.py", line 2439, in array_equal
    return bool(asarray(a1 == a2).all())
                        ^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I have a "workaround" right now which is just to not overwrite existing files. But would be keen to learn what might be going wrong, or find out how I can contribute to fix this bug if it is one. Let me know if I should make the local file available to you to reproduce it.

Many thanks

kaare-mikkelsen commented 12 months ago

I experience the same issue, though I am not sure it is directly related to whether the file exists or not. Any change to the file will make the error go away. For instance, this sequence will not generate an error:

hdf5storage.write({'labels':123},
                    path='./',filename='hep.mat',store_python_metadata = True, matlab_compatible=True)
hdf5storage.write({'labels':123,'r':2},
                    path='./',filename='hep.mat',store_python_metadata = True, matlab_compatible=True)
S-Dafarra commented 5 months ago

Hi there, I encountered the same issue. I am using version 0.1.19 which, at the moment of writing, is the version available via conda.

Another workaround seems to use the option truncate_existing=True. This will erase the existing content.