DeltaRCM / pyDeltaRCM

Delta model with a reduced-complexity approach
https://deltarcm.org/pyDeltaRCM/
MIT License
18 stars 11 forks source link

Metadata has lost track of time #239

Open elbeejay opened 2 years ago

elbeejay commented 2 years ago

I think the re-formatting of the netCDF dimensions / coordinates for v2.1.0 inadvertently resulted in the metadata losing track of the actual values of time.

>>> cube.meta['H_SL'].time
<xarray.DataArray 'time' (time: 455)>
array([  0,   1,   2, ..., 452, 453, 454])
Dimensions without coordinates: time

Not a huge deal since time can still be known from a grid variable, but we should bugfix this so dimensions are tracked everwhere.

amoodie commented 2 years ago

interesting. I just checked and confirm this is the case for my runs as well. I'll look at this bug today.

amoodie commented 2 years ago

On investigation, I think this is more of an xarray issue on the DeltaMetrics side. See below.

What we do now

When writing the pyDeltaRCM netcdf file, we place the coordinate arrays time x and y at the top level of the netcdf file. We then create a group within the netcdf file and place metadata there, which is given the dimension reference correctly. See lines here that include setting the fourth argument as the string 'time'.

Then, in DeltaMetrics, we open this netcdf file with xarray as a dataset, and importantly set_coords of the dataset here. xarray does not have support for hierarchical netcdf i/o, as best as I understand the current discussions (https://github.com/pydata/xarray/issues/4118). So, we just create a second xarray dataset called meta that is just the lower metadata group, and doesn't know anything about the top level. This means we can't set_coords here, so xarray just fills with a range on the coords.

Options I've thought of so far

  1. From the pyDeltaRCM end, we could write any needed coordinates into the meta group. This would increase file size, but only marginally. Then in DeltaMetrics, we could set_coords on the meta group.
  2. We can augment the behavior of DeltaMetrics to return a "fixed" dataarray when slicing the meta object. This works, I've tested it. See code below.
    @property
    def meta(self):
        metadata = self._dataio.meta
        # if coordinates are not set, try to augment the returned var with coords
        if len(metadata.coords) == 0:
            # loop through the dimensions of the sliced metadata
            for i, dim in enumerate(metadata.dims):
                if dim in self.coords:
                    # assign if found
                    metadata = metadata.assign_coords(dim=self[dim])
        return metadata
  3. Tell DeltaMetrics users to work around it. Use the indices auto-returned by xarray on time to slice the cube['time'] array and get times.

I'm not sure yet what my preferred solution here is.