Open rsignell-usgs opened 7 years ago
There are some updates in the v0.2.7 that enable files with dimension scales to be correctly uploaded to the HSDS service. There is still a problem with downloading the files which will require some HSDS updates to resolve.
Also, I noticed that there are some attributes in the sand.nc file that can't be read with h5py. These appear to be related to this issue: https://github.com/h5py/h5py/issues/719.
Looks like this was fixed in NetCDF on September 1: https://github.com/Unidata/netcdf-c/commit/4dd8e380c183a016a5edec5f5fd945b1e0954a5f
and released in version 4.5.0 on Oct 20 https://github.com/Unidata/netcdf-c/releases/tag/v4.5.0
I will try converting those files to netcdf4 again and see if that fixes the problem.
Ok thanks. For cases where a netcdf file with the bug is used, I've added a check so that hsload just prints a warning message and continues on with other attributes.
I used nccopy
from NetCDF 4.5.0 to recreate my Sandy netcdf4 files from the original netcdf3 files:
nccopy -7 -d 7 Sandy_ocean_his.nc Sandy_ocean_his_nc4c.nc
and then used hsload
to write to HSDS. The only error I got was:
$ hsload Sandy_ocean_his_nc4c.nc /home/rsignell/sandy2.nc
2017-12-03 19:42:48,871 utillib.py:266 ERROR: failed to create attribute script_file of object / -- unknown object type
ERROR: failed to create attribute script_file of object / -- unknown object type
When I try to load the HSDS dataset using xarray
with the h5netcdf
engine:
import xarray as xr
ds = xr.open_dataset('Sandy_ocean_his.nc')
ds = xr.open_dataset('/home/rsignell/sandy2.nc', engine='h5netcdf')
I get the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-8b828d9bcc43> in <module>()
----> 1 ds = xr.open_dataset('/home/rsignell/sandy2.nc', engine='h5netcdf')
~/.conda/envs/hsds/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
292 elif engine == 'h5netcdf':
293 store = backends.H5NetCDFStore(filename_or_obj, group=group,
--> 294 autoclose=autoclose)
295 elif engine == 'pynio':
296 store = backends.NioDataStore(filename_or_obj,
~/.conda/envs/hsds/lib/python3.6/site-packages/xarray/backends/h5netcdf_.py in __init__(self, filename, mode, format, group, writer, autoclose)
62 opener = functools.partial(_open_h5netcdf_group, filename, mode=mode,
63 group=group)
---> 64 self.ds = opener()
65 if autoclose:
66 raise NotImplementedError('autoclose=True is not implemented '
~/.conda/envs/hsds/lib/python3.6/site-packages/xarray/backends/h5netcdf_.py in _open_h5netcdf_group(filename, mode, group)
48 def _open_h5netcdf_group(filename, mode, group):
49 import h5netcdf.legacyapi
---> 50 ds = h5netcdf.legacyapi.Dataset(filename, mode=mode)
51 with close_on_error(ds):
52 return _nc4_group(ds, group, mode)
/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, **kwargs)
584 # if we actually use invalid NetCDF features.
585 self._write_ncproperties = (invalid_netcdf is not True)
--> 586 super(File, self).__init__(self, self._h5path)
587
588 def _check_valid_netcdf_dtype(self, dtype, stacklevel=3):
/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in __init__(self, parent, name)
241 # variables.
242 self._current_dim_sizes[k] = \
--> 243 self._determine_current_dimension_size(k, current_size)
244
245 if dim_id is None:
/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in _determine_current_dimension_size(self, dim_name, max_size)
286
287 for i, var_d in enumerate(var.dims):
--> 288 name = _name_from_dimension(var_d)
289 if name == dim_name:
290 max_size = max(var.shape[i], max_size)
/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in _name_from_dimension(dim)
34 # First value in a dimension is the actual dimension scale
35 # which we'll use to extract the name.
---> 36 return dim[0].name.split('/')[-1]
37
38
AttributeError: 'NoneType' object has no attribute 'split'
This was after changing import h5py
to import h5phd as h5py
in h5netcdf
.
@rsignell-usgs We are aware of this problem with h5netcdf and h5pyd. h5pyd currently cannot return the HDF5 path name for HDF5 objects that are not accessed following the file's hierarchy. Returning HDF5 dimension scale datasets as h5py.Dataset
is one of those types of access.
Are you working on enabling h5netcdf to work with h5pyd? I'm asking because I just started working on this in the last couple of days. No need for us to duplicate the effort.
@ajelenak-thg, no, I'm not working on it. I just forked h5netcdf
and replaced:
import h5py
with
import h5pyd as h5py
and then observed that didn't work.
@rsignell-usgs That's how far I was able to progress, too. 😃 I think @jreadey is working on a fix.
@jreadey, you used
hsload
to put our Hurricane Sandy netcdf4 file on HSDS:If I try to use
hsget
to get that dataset back, I get errors:And although I do end up with a
sandy.nc
file, if I try to ncdump it, it doesn't work (see below). I guess that is not too surprising in light of #32, right?But do you think one day we will be able to round-trip a dataset using
hsload
andhsget
?