HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
114 stars 38 forks source link

Variable length array #85

Open seb5g opened 4 years ago

seb5g commented 4 years ago

I'm creating a backend to use transparently pytables, h5py and h5pyd. I ran my test suite on h5pyd and am confronted with the issue of the variable length array. In h5py, they manage to do it using a "special" dtype: h5py.string_dtype or h5py.vlen_dtype . After some digging in I found in h5pyd, the function special_dtype where the docstring seems promising:

    vlen = basetype
        Base type for HDF5 variable-length datatype. This can be Python
        str type or instance of np.dtype.
        Example: special_dtype( vlen=str )

however after trying it out it seems its working only in the case where vlen=str and not any numpy dtype. Using a special dtype of np.uint32s I could create a dataset but when trying to access a given element I got this traceback: File "<ipython-input-114-66578725db5f>", line 1, in <module> dset[0] File "C:\Miniconda3\envs\pymodaq_dev\lib\site-packages\h5pyd\_hl\dataset.py", line 802, in __getitem__ arr1d = bytesToArray(rsp, mtype, page_mshape) File "C:\Miniconda3\envs\pymodaq_dev\lib\site-packages\h5pyd\_hl\base.py", line 503, in bytesToArray offset = readElement(data, offset, arr, index, dt) File "C:\Miniconda3\envs\pymodaq_dev\lib\site-packages\h5pyd\_hl\base.py", line 467, in readElement arr[index] = vlen(0) TypeError: 'numpy.dtype' object is not callable

Then a bit further in the code I found :

def check_dtype(**kwds): """ Check a dtype for h5py special type "hint" information. Only one keyword may be given.

vlen = dtype
    If the dtype represents an HDF5 vlen, returns the Python base class.
    Currently only builting string vlens (str) are supported.  Returns
    None if the dtype does not represent an HDF5 vlen.

So the question is: is it or will it be possible to use any numpy dtype for variable length arrays in h5pyd?

Thx

seb5g commented 4 years ago

After some more reading, your special_type function is same as in the older h5py API (that is before version h5py 2.9). Well that is just different names for same functionality except that in h5pyd, numpy special types are not working...yet?

jreadey commented 1 year ago

Hey - sorry somehow I missed this issue till now...

You can use h5pyd.special_dtype with numpy types like this example: https://github.com/HDFGroup/h5pyd/blob/master/test/hl/test_vlentype.py#L50.

There's also support for the new api: vlen_dtype as decribed here: ,https://github.com/h5py/h5py/pull/1132.
E.g.: https://github.com/HDFGroup/h5pyd/blob/master/test/hl/test_dataset.py#L1640.

The only special type missing is for regionrefs - which hopefully will get added soon.

I'll leave this issue open as a reminder to remove the old-style check_dtype, special_dtype functions since they are not in h5py anymore.