Closed rsignell-usgs closed 6 years ago
What do you think that pythonic way would be to switch the dependent module? E.g. something like this:
if "USE_H5PYD" in os.environ and os.environ["USE_H5PYD"]:
import h5pyd
else:
import h5py
Would that mess up packaging?
We could create an entirely new module (h5netcdfd?), but would then need to sync any changes from h5netcdf regularly.
The h5py module is actually only explicitly used a handful of times inside h5netcdf: https://github.com/shoyer/h5netcdf/blob/master/h5netcdf/core.py
Most of the time, we use method calls on existing h5py.File
objects.
What do you think that pythonic way would be to switch the dependent module?
Depending how much the interface needs to be changed to accommodate h5pyd, the cleanest way to do this is probably to add constructor arguments to the handful of h5netcdf functions/classes that open a file.
For example, we might change h5netcdf.File
like so:
class File(Group):
def __init__(self, path, mode='a', backend=h5py, **kwargs):
self._h5file = backend.File(path, mode, **kwargs)
Then using h5pyd behind the scenes as a user is as simple as h5netcdf.File(..., backend=h5pyd)
. If modules are not a complete drop-in equivalent, then at least we could accept string names like backend='h5pyd'
.
I certainly would be very happy to accept patches to add this flexibility in h5netcdf.
Generally checking environment variables for this sort of thing is discouraged, since it makes it hard to switch between options in user code (there are certainly legitimate cases for using both h5py and h5pyd at the same time).
@shoyer , this sounds great. I'd submit a PR right now except we should first wait for the shared dimensions to be working, https://github.com/HDFGroup/h5pyd/issues/32, right @jreadey?
@ajelenak-thg and @jreadey,
Just to record this somewhere, here's what I did to get a custom environment for the ESIP Winter Meeting (Jan 9-11, 2017), with xarray
working with h5pyd
:
With this h5pyd_env.yml
:
name: h5pyd
channels:
- conda-forge
- defaults
dependencies:
- python=3.6
- h5py
- nb_conda_kernels
- pytz
- requests
- matplotlib
- pip:
- git+https://github.com/HDFGroup/h5pyd.git@master
I did:
conda env create -f h5pyd_env.yml
source activate h5pyd
conda install xarray
conda remove h5netcdf
pip install --no-deps --upgrade git+https://github.com/ajelenak-thg/h5netcdf.git@h5pyd
conda install --no-deps xarray
BTW, thanks to @ocefpaf for helping me figure this out!
Xarray is now working nicely with HSDS: https://gist.github.com/rsignell-usgs/cc2d2d4fe1930bd949119e543b56bce1
Closing this issue, while Dask tasks remain: https://github.com/pangeo-data/pangeo/issues/75#issuecomment-357734564
h5netcdf is a pythonic interface to
netcdf4
files usingh5py
.It would be super cool to try
h5netcdf
on top ofh5pyd
instead.If that worked we could try
xarray
withdask
on top ofh5pyd
.And if that worked, it would be amazing....