HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
110 stars 39 forks source link

Error pulling a 'column' directly from a table with h5pyd #106

Open MRossol opened 2 years ago

MRossol commented 2 years ago

h5pyd is unable to pull a "column" from a recarray/table directly.

Example code using h5py:

In [14]: with h5py.File(path, mode='r') as f:
    ...:     sector = f['enumerations']['sector']['id']
    ...:
    ...:     print(sector)
    ...:
[b'com' b'res' b'trans' b'ind']

Same attempt in h5pyd:

In [12]: with h5pyd.File(hsds_path, mode='r') as f:
    ...:     sector = f['enumerations']['sector']['id']
    ...:
    ...:     print(sector)
    ...:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-ecf8d2f2a8ec> in <module>
      1 with h5pyd.File(hsds_path, mode='r') as f:
----> 2     sector = f['enumerations']['sector']['id']
      3
      4

~/miniconda3/lib/python3.9/site-packages/h5pyd/_hl/dataset.py in __getitem__(self, args)
    862                         self.log.info("binary response, {} bytes".format(len(rsp)))
    863                         #arr1d = numpy.frombuffer(rsp, dtype=mtype)
--> 864                         arr1d = bytesToArray(rsp, mtype, page_mshape)
    865                         page_arr = numpy.reshape(arr1d, page_mshape)
    866                     else:

~/miniconda3/lib/python3.9/site-packages/h5pyd/_hl/base.py in bytesToArray(data, dt, shape)
    497         for index in range(nelements):
    498             offset = readElement(data, offset, arr, index, dt)
--> 499     arr = arr.reshape(shape)
    500     return arr
    501

ValueError: cannot reshape array of size 12 into shape (4,)

For reference, the source .h5 file is here: s3://oedi-data-lake/dsgrid-2018-efs/state_hourly_residuals/eia_annual_energy_by_sector.dsg the hsds domain is in the s3://nrel-pds-hsds/ bucket here: '/nrel/dsgrid-2018-efs/state_hourly_residuals/eia_annual_energy_by_sector.dsg'

jreadey commented 2 years ago

That's a feature not yet supported on h5pyd. As a work-around you can read the desired selection into a numpy array then extract the column from that.

MRossol commented 2 years ago

Thanks @jreadey, That was the work around I suggested!