Open gheber opened 2 years ago
It's pretty easy now to read a numpy array with h5pyd and convert to a Pandas dataframe. See: https://github.com/HDFGroup/hdflab_examples/blob/master/Tutorial/09-Queries.ipynb for an example.
Using HSDS as the basis for a distributed table package would be interesting. This idea is explored a bit in: https://github.com/h5py/h5py/issues/2095.
Right, but I want to read an HDF5 file created via DataFrame.to_hdf
.
Or DataFrame.to_hsds
:smile:
Perhaps this could be done by enabling pandas HDF-related methods to accept an h5py.File object? Then this could also be an h5pyd.File object.
Pandas is designed to work with in-memory data which has led to several other projects that support Pandas-like API but work with larger data sets than Pandas can support. Something like: https://github.com/vaexio/vaex, already supports HDF5. Extend to support h5pyd?
import h5pyd as h5py
-> Happinessimport pandahsds as pandas
-> Sadness