HDFGroup / h5pyd

h5py distributed - Python client library for HDF Rest API
Other
111 stars 39 forks source link

Get dimension scales working for netcdf4 access #32

Closed rsignell-usgs closed 6 years ago

rsignell-usgs commented 6 years ago

In order to use h5netcdf as a netcdf4 interface on top of h5pyd, we first need dimensions scales working, which will enable the shared dimensions in netcdf4.

rsignell-usgs commented 6 years ago

@jreadey, I realize this is more of an HSDS issue, right?
Any ETA on supporting dimension scales there?

jreadey commented 6 years ago

Right the create_scale method in h5py just turns around and calls the HDF5 library.

Looking at the dimension scale spec: https://support.hdfgroup.org/HDF5/doc/HL/H5DS_Spec.pdf, I think I just need to implement object references in HSDS to get this working.

That's on schedule for August so I'd hope we can get it in h5pyd/h5netcdf soon.

rsignell-usgs commented 6 years ago

@jreadey, awesome, thanks!
cc @rabernat

rsignell-usgs commented 6 years ago

@jreadey , just curious: any update here?
Would love to see this working with xarray/h5netcdf...

jreadey commented 6 years ago

@rsignell-usgs - I got pulled into some other work last week - promise to get to work on it this week.

rsignell-usgs commented 6 years ago

@jreadey , I'm at the Unidata Users Committee Meeting on Monday. Anything progress/roadblocks to report here?

jreadey commented 6 years ago

@rsignell-usgs - sorry, we've had some other items come up. Will target having dimension scales working by end of month.

rsignell-usgs commented 6 years ago

@jreadey , any updates?

jreadey commented 6 years ago

@rsignell-usgs - we have dimscales implemented in h5pyd thanks to @ajelenak-thg. It is still fairly "fresh" so please try it out and let us know if you run into any issues. I haven't created a release yet, so you'll need to do a "git pull" and "python setup.py install" to get the latest.

Also, I've been tinkering with the hsload util to do the right thing with files that include dimension scales. I'll try this out with your sandy NetCDF file.

rsignell-usgs commented 6 years ago

@jreadey , okay, I'll check that out when I get a chance. I'm at a joint USGS/NOAA meeting this week at the National Water Center in Tuscaloosa. Talking about NWS National Water Model that produces 1TB forecast data in NetCDF each day.
http://water.noaa.gov/about/nwm

Folks I've spoken to here seem very interested in HSDS/h5pyd.

jreadey commented 6 years ago

Dimension scales are implement in v0.2.7 release and HSDS. I'll close this issue and request users create new issues for any specific issues with the dimscale implementation.

rsignell-usgs commented 6 years ago

I was able to get this going using tips from @ajelenak-thg

Here's the hypyd_env.yml environment file I used:

name: h5pyd
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.6
  - h5py
  - jupyter
  - pytz
  - requests
  - matplotlib
  - xarray
  - pip:
      - https://github.com/HDFGroup/h5pyd.git@master
      - https://github.com/ajelenak-thg/h5netcdf.git@h5pyd

and then in this h5pyd environment:

import xarray as xr
import h5pyd

I wasn't able to use @ajelenak-thg HSDS endpoint because I didn't have permission, so this didn't work:

f = xr.open_dataset('http://35.166.11.248:5101/home/ajelenak/MERRA2_400.statD_2d_slv_Nx.20171031.nc4',
                    engine='h5netcdf')

but I was able to open one of my data endpoints on @jreadey HSDS on XSEDE:

f = xr.open_dataset('http://149.165.156.174:5101/home/rsignell/Sandy_ocean_his_nc4.nc', engine='h5netcdf')

Took 90 seconds to open, which as @ajelenak-thg explained:

Expect a delay when h5netcdf/xarray are opening HSDS files. HSDS is still not optimized for metadata-type requests and these happen a lot on file open. I counted 184 requests for the above MERRA-2 file. The number of requests is directly related to the number of variables and global/variable attributes in a file.