no `location` variable attribute present in netcdf file written by xugrid

veenstrajelmer commented 6 months ago

in D-EcoImpact, the location attribute is required to map all variables to mesh/node/face locations. This attribute is also described in the conventions (search for Mesh2_depth:location = "node"). However, I cannot judge whether this attribute is required.

I expect xugrid just checks which dimensions are present in the variable. Since there can be only one topology dimension per variable and xugrid is aware which these are, it should be quite simple to map these.

There are two possible solutions:

xugrid automatically writes the location attribute to all data_vars. This should also be adapted after interpolating variables from nodes to faces (when implementing something like this for instance: https://github.com/Deltares/xugrid/issues/127). So actually, upon reading it should be removed, and upon writing it should be added.
D-EcoImpact also uses automatic topology dimension matching like xugrid. In that case please point us to the code that does this.

@hrajagers and @Huite could you also state whether this location attribute is required or not?

Huite commented 6 months ago

I must admit I'm a little confused by the introduction of these attributes: they seem fully redundant, because the dimension already associates data with a specific aspect (node, edge, face) of the topology as you mention.

From glancing the UGRID conventions docs, they are only mentioned for 3D layered mesh topology and onwards.

The UGRID convention pages specifically link to this section: https://ugrid-conventions.github.io/ugrid-conventions/#data-defined-on-unstructured-meshes

I think this gives a bit of background:

The coordinates attribute refers to the variables that contain the latitude and longitude coordinates. For a curvilinear grid these variables will share two spatial dimensions, here nmax and mmax: lat(nmax,mmax) and lon(nmax,mmax). In numerical models the various quantities are often computed at different locations of the mesh: staggered data. The standard CF-conventions don’t offer specific support for this functionality and thus for every stagger location coordinates need to be provided separately: cell center coordinates, corner point coordinates, u-flux point coordinates, and v-flux point coordinates. The underlying topology of the mesh, i.e. how these coordinates (variable definition locations) relate to each other isn’t stored in the file. This shortcoming is to some degree solved by the gridspec proposal by Venkatramani Balaji and Zhi Liang. We introduce here attributes that link to the topological data defined above.

So if you have e.g. staggered data on a structured grid, you need some additional attribute to indicate the location. For UGRID conventions, this isn't required: nodes, edges, and faces are explicitly represented. However, if you want to be "consistent" in some sense, you might argue that if you are specifying the location for a structured topology, that you can also do for it an unstructured topology -- even though the UGRID conventions do not require it at all.

Anyway, my understanding is possibly a bit limited, but this does not really seem part of the UGRID conventions to me. In particular, the redundancy also creates openings for conflicts. Let's say your dimension is mesh2d_nFace, but the location attribute states "node", what are you going to do? The dimension counts heavier here, because netCDF will not allow you to associate dimensions if the array shapes don't match.

I would strongly suggest a single source of truth provided by the dimensions, not these additional attributes for D-Ecoimpact.

veenstrajelmer commented 6 months ago

Thanks for this elaborate explanation. Could you share a short snippet of xugrid code on how to quickly match a particular variable to node/face/edge? I looked around a bit but could not find what I was looking for real quick. If it is too much effort, I will dive a bit deeper.

Huite commented 6 months ago

Regarding this point:

D-EcoImpact also uses automatic topology dimension matching like xugrid. In that case please point us to the code that does this.

Xugrid infers the dimension based on the connectivity variables and the coordinates (so e.g. edge_node_connectivity should have dimensions (edge_dim, 2); edge_x should have (edge_dim,). Worth nothing that there's already room for conflicts here: somebody can specify edge_dimA on their edge_node_connectivity and edge_dimB on their edge_x and edge_y coordinates.

This function mostly takes care of it: https://github.com/Deltares/xugrid/blob/ca17eb0d1cf39f36b9c7b50f7b7bd3e4e4e74f7f/xugrid/ugrid/conventions.py#L250-L301

The upper cased "constants" define the expectations in terms of roles:

_COORD_DIMS = {
    "node_coordinates": "node_dimension",
    "edge_coordinates": "edge_dimension",
    "face_coordinates": "face_dimension",
}
...
_CONNECTIVITY_DIMS = {
    "face_node_connectivity": ("face_dimension", None),
    "edge_node_connectivity": ("edge_dimension", 2),
    "face_edge_connectivity": ("face_dimension", None),
    "face_face_connectivity": ("face_dimension", None),
    "edge_face_connectivity": ("edge_dimension", 2),
    "boundary_node_connectivity": ("boundary_edge_dimension", 2),
}

In terms of types:

string means the expected UGRID role with arbitrary size
None means: no requirements, arbitrary name or size
int means: arbitrary name, just an "auxiliary" dimension, but size should be fixed (in the UGRID conventions examples, e.g. a dimension name of "Two" is used). I like checking the size anyway if possible.

(You can argue that mixing names and sizes is a bit (too) pragmatic and that it should be encoded in a namedtuple or dataclass more explicitly.)

veenstrajelmer commented 6 months ago

Ok, this would then be something like this:

from xugrid.ugrid.conventions import _get_connectivity, _get_topology, _get_coordinates, _get_dimensions
file_nc = dfmt.data.fm_curvedbend_map(return_filepath=True)
ds = xr.open_dataset(file_nc)
# ds = xu.data.adh_san_diego(xarray=True) # 1D topology
topologies = _get_topology(ds)
coordinates = _get_coordinates(ds, topologies=topologies)
connectivities = _get_connectivity(ds, topologies=topologies)
dims = _get_dimensions(ds, topologies=topologies, connectivity=connectivities, coordinates=coordinates)
print(dims)

Gives:

{'mesh2d': {'node_dimension': 'mesh2d_nNodes', 'face_dimension': 'mesh2d_nFaces', 'edge_dimension': 'mesh2d_nEdges'}}

We still need to match the variable to the topology name, but that seems easy from here on. It does seem that adding xugrid as a dependency for this is useful (that would simplify the code a lot), or cherry pick the code a bit.

Huite commented 6 months ago

Ah okay, with your example I get what you're after. I considered something like that in the past, I think you can get all of this in a much nicer form by using the UgridRolesAccessor:

https://deltares.github.io/xugrid/api/xugrid.UgridRolesAccessor.html#xugrid.UgridRolesAccessor

ds = xr.open_dataset(file_nc)
dim = ds.ugrid_roles["mesh2d"]["node_dimension"]

To clarify, this is an xarray accessor that'll work on Datasets.

All of this logic is fully contained in the conventions.py module, so you could just copy paste that. Depends a little on how many additional dependencies xugrid would introduce for D-EcoImpact. If you're using xarray anyway, it might be easier to just rely on xugrid.

Alternatively, we could split the conventions handling into a separate package, but given the number of packages we're already supporting it's just a lot more convenient to keep things together.

veenstrajelmer commented 6 months ago

I knew there should be an easier way to get to this. Thanks for sharing this. This is a simple application of such a topology matching method:

import xarray as xr
import dfm_tools as dfmt

file_nc = dfmt.data.fm_curvedbend_map(return_filepath=True)
ds = xr.open_dataset(file_nc)

def get_topodim(ds, varname):
    ugrid_roles = ds.ugrid_roles["mesh2d"]
    vardims = ds[varname].dims
    if ugrid_roles["node_dimension"] in vardims:
        topodim = "node"
    elif ugrid_roles["face_dimension"] in vardims:
        topodim = "face"
    elif ugrid_roles["edge_dimension"] in vardims:
        topodim = "edge"
    else:
        topodim = None
    return topodim

for varname in ["mesh2d_sa1", "mesh2d_tureps1", "mesh2d_node_z"]:
    topodim = get_topodim(ds, varname=varname)
    print(f"{varname}: {topodim}")

Prints:

mesh2d_sa1: face
mesh2d_tureps1: edge
mesh2d_node_z: node

You could of course also change this function such that it adds the location attribute to each variable and add it to the D-EcoImpact code.

Deltares / xugrid

no `location` variable attribute present in netcdf file written by xugrid #221