Deltares / hydromt

HydroMT: Automated and reproducible model building and analysis
https://deltares.github.io/hydromt/
MIT License
74 stars 30 forks source link

bug after updating to xugrid v0.1.4 #184

Closed DirkEilander closed 2 years ago

DirkEilander commented 2 years ago

error message: *** AttributeError: 'UgridDatasetAccessor' object has no attribute 'grid'

We started using the .set_crs() method and .bounds attribute of grid which work in v0.1.3, but this now yields the above error. @Huite Do you keep a changelog or can you point us to where these methods are found now?

Huite commented 2 years ago

Unfortunately, I hadn't found the time to update the changelog yet.

In a nutshell, 0.1.4 supports multiple grid topologies in a single UgridDataset whereas before you had to split them by topology. However, this means that anything involving a grid for a dataset has become a little more involved, and .grid has become .grids for a dataset.

However, there should be a higher level way working with this instead of directly operating on the grids when dealing with Ugrid DataArray or Datasets.

Huite commented 2 years ago

I've included a changelog and added methods for dealing with CRS: https://github.com/Deltares/xugrid/commit/fd65cd836036b1be6ae49df464cea8d41cd8dba8

Would it be possible for you to try out with local clone + development install to see if this addresses your use case? If not, I can make some changes before making a new pypi & conda release.

hboisgon commented 2 years ago

Thanks @Huite I'll try it out

hboisgon commented 2 years ago

@Huite: crs now works fine indeed!

I'm a bit confused with the use of .grid for a DataArray and .grids for a Dataset... In our case we switch to Dataset to support multiple variables (on one or maybe then several grids) so when trying grid operations in hydromt that means we would always have to check object instance to use uda.grid for DataArray or uds.grids for Dataset.

Also, .grids always returns a list so any accessor is broken unless specifying a list element eg uds.grids[0].bounds. Maybe for some properties we could define fast accessors like uds.grids.total_bounds that would internally loop over bounds of all grids? Do you already know if grids list will have 1 grid for all Ugrid2d and 1 grid for all Ugrid1d ? Or do you foresee several Ugrid2d in grids? If first you could redefine a uds.grid1d and uds.grid2d accessors? Else would also be nice for UgridDataset with multiple variables but single grid to still get uds.grid working? Anyhow feel free to contact me if you want to discuss further

Huite commented 2 years ago

I agree that having .grid on a DataArray versus .grids on a Dataset is rather inconvenient, and having multiple grids in a Dataset makes a lot of things more a lot more complicated for me. However, based on the UGRID conventions, it's a very natural thing to do, and include many topologies in a single (netCDF) dataset. In terms of concepts, I think these grids are most comparable with Indexes in xarray, and a dataset can contain many Indexes. E.g. if you have two rasters in a Dataset with different cell sizes, xarray supports this (with different dimensions, e.g. x1, y1 and x2, y2). This isn't true for a DataArray: a single variable can only be defined on one grid (you can't have multiple dimensions fulfilling the same role).

We don't have to support this per se, you could try and separate a Dataset and split them based on the mesh topology. But you have to decide for non-UGRID data as well. So you have to return a dict of Datasets, with {"network1d": ..., "mesh2d": ..., "other": ...}. In the end, I expect that a Dataset can multiple Ugrid1d, multiple Ugrid2, and perhaps even Ugrid3d instances associated with it. I'm still open to this approach, as it makes the UgridDataset both simpler and more like the UgridDataArray.

Ideally, however, all of this works for convience:

ds = xr.open_dataset("multi-topology.nc")
uds = xugrid.UgridDataset(ds)

uds = xugrid.open_dataset("multi-topology.nc")

uds.to_netcdf("multi-topology.nc")

Without having to intermediate splitting and merges.

I think we're generally used to having a single x and y axis for all variables in our Datasets, since when you're working with model data, it tends to have a single computational grid. But CF-conventions (and xarray) fundamentally don't have this limitation! (I realize I don't know what rioxarray does in this case). Maybe this is a bit academic: most of the time we do work with a single computational grid, and we can assume there's just a .grid for a dataset as well, until the end when we maybe merge everything into one dataset and then write it.

But we can have our cake and eat it! I can define a .grid property on the UgridDataset as follows:

@property
def grid(self):
    if len(self.grids) == 1:
        return self.grids[0]
    else:
        raise ValueError("Multiple grids")

This seems like a very pragmatic solution to me, but I'm not entirely sure. I worry a little that on the long term maintenance will get more tricky if we need to always check for both one or many grid instances. In terms of maintenance, having only a single grid in a dataset is definitely the easiest!

We might also have to discuss what operations require you to operate on the grids directly while they're attached to an xarray object. In my expectation, this shouldn't be relatively rare; if this isn't that case for you, I'd hope that we can provide a higher level method which does the work.

hboisgon commented 2 years ago

After discussions, two issues are created on xugrid for a .grid and .total_bounds properties on Dataset: https://github.com/Deltares/xugrid/issues/20 https://github.com/Deltares/xugrid/issues/21