deshaw / versioned-hdf5

Versioned HDF5 provides a versioned abstraction on top of h5py
https://deshaw.github.io/versioned-hdf5/
Other
76 stars 19 forks source link

Slicetools platform-specific error `could not get virtual filename` #356

Open peytondmurray opened 1 month ago

peytondmurray commented 1 month ago

Likely related to #352 - the following is broken on Linux (but not Mac) with the latest commit on master:

import h5py
from versioned_hdf5 import VersionedHDF5File
import numpy as np

d = './testdata.h5'

with h5py.File(d, mode="w") as f:
    vf = VersionedHDF5File(f)
    with vf.stage_version("r0") as sv:
        sv.create_dataset('values', data=np.arange(100), chunks=(10,), maxshape=(None,))

with h5py.File(d, mode="r+") as f:
    vf = VersionedHDF5File(f)
    with vf.stage_version("r1") as sv:
        values = sv['values']
        values.resize((110,))  # <-- Exception raised here

With h5py==3.11.0 I get the following:

HDF5-DIAG: Error detected in HDF5 (1.14.4-3) thread 0:
  #000: H5Pdcpl.c line 2406 in H5Pget_virtual_filename(): can't find object for ID
    major: Object ID
    minor: Unable to find ID information (already closed?)
  #001: H5Pint.c line 4102 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Can't compare objects
  #002: H5Pint.c line 4053 in H5P_isa_class(): not a property list
    major: Invalid arguments to routine
    minor: Inappropriate type
Traceback (most recent call last):
  File "/home/pdmurray/Desktop/workspace/sandbox/vhdf/run.py", line 20, in <module>
    values.resize((110,))  # <-- Exception raised here
    ^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdmurray/Desktop/workspace/versioned-hdf5/versioned_hdf5/wrappers.py", line 777, in resize
    data_dict = self.id.data_dict
                ^^^^^^^^^^^^^^^^^
  File "/home/pdmurray/Desktop/workspace/versioned-hdf5/versioned_hdf5/wrappers.py", line 1451, in data_dict
    self._data_dict = build_data_dict(dcpl, self.raw_data.name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "slicetools.pyx", line 128, in versioned_hdf5.slicetools.build_data_dict
  File "slicetools.pyx", line 144, in versioned_hdf5.slicetools.build_data_dict
  File "slicetools.pyx", line 168, in versioned_hdf5.slicetools.build_data_dict
ValueError: Could not get virtual filename
peytondmurray commented 1 month ago

As I mentioned above, I can reproduce with 3.11.0 but as of h5py commit 2c80981022e741a04e02d4afd6ae78bac1bc770f the error is gone and everything works as intended. I can't yet tell if this is again some kind of cython/typing issue, but one thing to note is that it is probably not necessary to call into the underlying hdf5 library to get the virtual filename - the dcpl object passed into the build_data_dict function where the issue is happening is a PropDCID type with a pure-Cython .get_virtual_filename function, with the only call into Python code being the allocation of a bytes object. So I doubt if we would lose much if we just call into that function instead.