Closed rsignell-usgs closed 6 years ago
@jreadey, I realize this is more of an HSDS issue, right?
Any ETA on supporting dimension scales there?
Right the create_scale method in h5py just turns around and calls the HDF5 library.
Looking at the dimension scale spec: https://support.hdfgroup.org/HDF5/doc/HL/H5DS_Spec.pdf, I think I just need to implement object references in HSDS to get this working.
That's on schedule for August so I'd hope we can get it in h5pyd/h5netcdf soon.
@jreadey, awesome, thanks!
cc @rabernat
@jreadey , just curious: any update here?
Would love to see this working with xarray/h5netcdf...
@rsignell-usgs - I got pulled into some other work last week - promise to get to work on it this week.
@jreadey , I'm at the Unidata Users Committee Meeting on Monday. Anything progress/roadblocks to report here?
@rsignell-usgs - sorry, we've had some other items come up. Will target having dimension scales working by end of month.
@jreadey , any updates?
@rsignell-usgs - we have dimscales implemented in h5pyd thanks to @ajelenak-thg. It is still fairly "fresh" so please try it out and let us know if you run into any issues. I haven't created a release yet, so you'll need to do a "git pull" and "python setup.py install" to get the latest.
Also, I've been tinkering with the hsload util to do the right thing with files that include dimension scales. I'll try this out with your sandy NetCDF file.
@jreadey , okay, I'll check that out when I get a chance. I'm at a joint USGS/NOAA meeting this week at the National Water Center in Tuscaloosa. Talking about NWS National Water Model that produces 1TB forecast data in NetCDF each day.
http://water.noaa.gov/about/nwm
Folks I've spoken to here seem very interested in HSDS/h5pyd.
Dimension scales are implement in v0.2.7 release and HSDS. I'll close this issue and request users create new issues for any specific issues with the dimscale implementation.
I was able to get this going using tips from @ajelenak-thg
Here's the hypyd_env.yml
environment file I used:
name: h5pyd
channels:
- conda-forge
- defaults
dependencies:
- python=3.6
- h5py
- jupyter
- pytz
- requests
- matplotlib
- xarray
- pip:
- https://github.com/HDFGroup/h5pyd.git@master
- https://github.com/ajelenak-thg/h5netcdf.git@h5pyd
and then in this h5pyd
environment:
import xarray as xr
import h5pyd
I wasn't able to use @ajelenak-thg HSDS endpoint because I didn't have permission, so this didn't work:
f = xr.open_dataset('http://35.166.11.248:5101/home/ajelenak/MERRA2_400.statD_2d_slv_Nx.20171031.nc4',
engine='h5netcdf')
but I was able to open one of my data endpoints on @jreadey HSDS on XSEDE:
f = xr.open_dataset('http://149.165.156.174:5101/home/rsignell/Sandy_ocean_his_nc4.nc', engine='h5netcdf')
Took 90 seconds to open, which as @ajelenak-thg explained:
Expect a delay when h5netcdf/xarray are opening HSDS files. HSDS is still not optimized for metadata-type requests and these happen a lot on file open. I counted 184 requests for the above MERRA-2 file. The number of requests is directly related to the number of variables and global/variable attributes in a file.
In order to use h5netcdf as a
netcdf4
interface on top ofh5pyd
, we first need dimensions scales working, which will enable the shared dimensions innetcdf4
.