HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
129 stars 53 forks source link

pandas Support #156

Open gheber opened 2 years ago

gheber commented 2 years ago

import h5pyd as h5py -> Happiness import pandahsds as pandas -> Sadness

jreadey commented 2 years ago

It's pretty easy now to read a numpy array with h5pyd and convert to a Pandas dataframe. See: https://github.com/HDFGroup/hdflab_examples/blob/master/Tutorial/09-Queries.ipynb for an example.

Using HSDS as the basis for a distributed table package would be interesting. This idea is explored a bit in: https://github.com/h5py/h5py/issues/2095.

gheber commented 2 years ago

Right, but I want to read an HDF5 file created via DataFrame.to_hdf.

gheber commented 2 years ago

Or DataFrame.to_hsds :smile:

ajelenak commented 2 years ago

Perhaps this could be done by enabling pandas HDF-related methods to accept an h5py.File object? Then this could also be an h5pyd.File object.

jreadey commented 2 years ago

Pandas is designed to work with in-memory data which has led to several other projects that support Pandas-like API but work with larger data sets than Pandas can support. Something like: https://github.com/vaexio/vaex, already supports HDF5. Extend to support h5pyd?