Reference UGRID conventions in CF

cf-convention / cf-conventions

AsciiDoc Source

http://cfconventions.org/cf-conventions/cf-conventions

Creative Commons Zero v1.0 Universal

87 stars 45 forks source link

Reference UGRID conventions in CF #153

Closed rsignell-usgs closed 1 year ago

rsignell-usgs commented 5 years ago

As discussed in Trac ticket 171 we would like to associate a specific version of UGRID with each version of CF.

We propose to simply add a section 1.5 to the Conventions Document called "Relationship to the UGRID Conventions" which would say:

UGRID is a convention for unstructured (e.g. triangular) grids that supplements the CF Conventions, including specification of grid topology and location of data on grid elements. Each version of CF is associated with a particular version of UGRID through the Conventions attribute in 2.6.1.

Then in Section 2.6.1, modify the beginning to read:

We recommend that netCDF files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.8" which also implies "UGRID-1.0".

ChrisBarker-NOAA commented 3 years ago

in what sense is it symmetrical?

Just trying to get my head around this, it seems to me that what is meant is : we provide a connectivity array, which is a square matrix describing the connectivity relation "from cells to cells", i.e. connectitivy[i,j] == cell[i] adjoins cell[j] where i and j both range over the same cell dimension. In which case, it is just a symmetric matrix, meaning exactly that .. 'adjoins' is a symmetric relation ; or cell[i] adjoins cell[j] <--> cell[j] adjoins cell[i] (thus, the array == its own transpose, which is a definition of symmetry).

Well, conceptually maybe -- but that's not how anyone encodes the connectivity -- and it would be a huge, very sparse boolean array: number_of_cells X number_of_cells. The connectivity is encoded, in UGRID at least, as a length-number-of-cells array of (cell_number, cell_number, cell_number) triples (for a triangular mesh)

I just looked at the UGRID docs, and oddly, even though the examples are tiny, we didn't seem to put the actual example arrays in, which does make it less clear than it could be :-(

pp-mo commented 3 years ago

@ChrisBarker-NOAA Well, conceptually maybe -- but that's not how anyone encodes the connectivity

Oh I see!
I was really only thinking in abstract terms, about how why you would describe the validity constraint as a 'symmetry', and not the actual 'face-face-connectivity' arrays or similar. So, I assumed that was the viewpoint of the more abstract description in the CF datamodel.

Looking at it again, from the viewpoint of actual UGRID connectivity data, that does make the 'does X connect to X' question seem a rather abstruse and unimportant.
However, the basic symmetry rule is still a relevant requirement : and it is still not implicit in the form of the data.

E.G. ... Suppose we have a "face-face-connectivity" array "ffc[i,j]", where i ranges over faces and j over 1..nA, where nA is a maximum adjacency number (i.e. typically 3 or 4). Then a "no-self-connections" rule is something like for all j in 1..nA ; ffc[i,j] != i But a "symmetry rule" is rather more involved : something like ...

for all i1 in 1..nF ; for all j1 in 1..nA ; 
  EITHER ffc[i1,j1] = missing 
  OR exists j2 in 1..nA s.t. ffc[i2, j2] = ffc[i1, j1]

-- obviously, the 'missing' possibility complicates it

There are also some other possible rules you might want to state here, such as for any i, all the ffc[i,j] should be different (except for missing points).

I think that this is maybe showing that the more abstract form of a "binary connections map" does provide a more natural and flexible route for describing the mathematical properties. I'm really no expert, but I think that basic graph theory is written in terms like these.

ChrisBarker-NOAA commented 3 years ago

Looking at it again, from the viewpoint of actual UGRID connectivity data, that does make the 'does X connect to X' question seem a rather abstruse and unimportant.

?? -- but does X connect to Y is the key concept we are trying to capture here.

From the data model perspective, there needs to be SOME way to define the connectivity. how it's done is a matter of the "encoding", yes?

that basic graph theory is written in terms like these.

I'll take your work for it :-) But in "lay" terms, I might say something like:

""" The connectivity arrays capture how the individual component of the mesh relate to each other. For example, for the face-face connectivity, each face has N neighbors (where N is the order of the cells, e.g. triangles are N=3). So for each face, the indexes of the three connecting faces is provided, resulting in a num_faces X N array. Note that there is duplication where there is symmetry: that is, if face i connects to face j, face j connects to face i. """ For the most trivial example of a two-triangle mesh, you would have, for the face-face connectivity:

[[-1, -1, 1],  # face 0 connects only to face 1
 [-1, -1, 0],  # face 1 connects only to face 0
]

where -1 means "nothing" and the faces are zero indexed.

So that those two array entries have dublicative information, representing the symmetry

NOTE: I realised as I was writing that that the encoding does not have to be duplicative and symmetric. We tend to store it that way because it's easy to ask the question:"what are the neighbors of face i?" -- but we could only store the connections, which would make it harder to interpret, but easy to encode without duplication. The above two-face mesh would simply be:

[[1, 0]]

that is: face one connects to face zero. (and therefor face zero connects to face 1 as well)

In fact, in UGRID, the "edges" connectivity is described exactly like that.

pp-mo commented 3 years ago

From the data model perspective, there needs to be SOME way to define the connectivity. how it's done is a matter of the "encoding", yes?

Yes, that is what I was "trying" to say !

?? -- but does X connect to Y is the key concept we are trying to capture here. in "lay" terms, I might say something like: ...

All agreed, absolutely I was just trying to express that I think the two descriptions are compatible, and in fact basically mean the same thing.

So, I think the key point is that how UGRID actually organises things (what I termed "encoding") is not the same terminology in which the CF datamodel description is decribed.
So, we do need to be confident that they are equivalent, but that also means exploring the rules and constraints which are part of the description (on either side of the fence).

If I read correctly what has been said here ... The CF datamodel only states that the connectivity relation must be symmetrical : "If A connects to B, then B connects to A". It also describes such relations only on one "domain axis".

My ouststanding concern is that UGRID does define additional information structures (and governing "rules") which the CF 'domain' model has no interest in.

(1) Firstly, a UGRID mesh with multiple locations relates to multiple domain-axes in CF That can "work", because any data-variable can only reference one of them, as described :

we have no use case for two or more topology constructs, each of which applies to a single unique domain axis, and in fact we have no way of encoding it, so that case should indeed be excluded.

So, the CF 'topology construct' can only be atttached to a single domain axis of a domain. That means that UGRID data which maps to a different mesh location is modelled as belonging to a separate, independent "domain". This seems okay for now, but it means of course that the CF decsription has no concept of the intercoupled nature of the different locations.

(2) likewise, the CF description has no place for the cross-location connectivities (like face-edge-connectivity)

So, I'm not sure where any of that might become practically relevant, but I'm still a little worried that they aren't really describing the same ideas with the same range of possibilities.

ChrisBarker-NOAA commented 3 years ago

So, the CF 'topology construct' can only be atttached to a single domain axis of a domain. That means that UGRID data which maps to a different mesh location is modelled as belonging to a separate, independent "domain". This seems okay for now, but it means of course that the CF decsription has no concept of the intercoupled nature of the different locations.

(2) likewise, the CF description has no place for the cross-location connectivities (like face-edge-connectivity)

So, I'm not sure where any of that might become practically relevant, but I'm still a little worried that they aren't really describing the same ideas with the same range of possibilities.

I"m concerned as well. ONe thing that's become clear to me working with these botu unstructured and "staggered" grids in the last few years is that the entire "mesh" or whatever we want to call it, is one "thing". That is, you cannot treat the collection of nodes, the collection of faces, the collection of edges, each as a separate meshes. Sure, data may be associated specifically with only one of these, but, for example:

If values on the nodes ar treated as their own thing, you can plot them fine -- but if you need to interpolate the values, you need to know how the faces are defined.

Values on the edges, similarly, culd be plotted, but and edge is very much an interace between two faces -- it does not live on its own at all.

And boundaries are again, very much properties of the faces.

I'm still not clear what a "domain axis" means in the CF data model, but we really do need to capture the concept that the mesh is not unrelated pieces, it is one thing -- you can't really know what any of the data fully means without the full mesh description.

davidhassell commented 3 years ago

Hi @ChrisBarker-NOAA and @pp-mo,

I have been away and am just catching up with the conversation. Thank you for an interesting read!

From the data model perspective, there needs to be SOME way to define the connectivity. how it's done is a matter of the "encoding", yes?

Absolutely.

Patrick's description of symmetry (https://github.com/cf-convention/cf-conventions/issues/153#issuecomment-882630042) is right - it is symmetric in the square matrix sense. This is indeed not at all how UGRID actually encodes this information, but the CF data model is independent of the encoding, and the symmetric matrix is logically what is going on here. Whilst there is no expectation that anyone should encode it in this manner, it is tempting (to me!) because the square connectivity matrix can be easily updated in subspacing operations.

It's a good point about what to do on this matrix's diagonal - I think that the option of these values having 'no meaning regardless of value' is sufficient for the data model. Whether you use booleans, integers, strings, etc. to denote connected/not connected is entirely an encoding choice and has no impact on the data model.

we really do need to capture the concept that the mesh is not unrelated pieces, it is one thing -- you can't really know what any of the data fully means without the full mesh description.

(2) likewise, the CF description has no place for the cross-location connectivities (like face-edge-connectivity)

The CF model does connect (e.g.) faces with edges and nodes, but in a different manner to UGRID. In CF, a "cell" is typically defined as the "space" enclosed by bounds, and the edges of the cell are the connections between adjacent cell bounds. This space may be 0-d, in which case it is just a "node"; or 1-d, in which case it is an "edge" connecting two "nodes"; or 2-d, in which case it is a "face" defined by "edges" and "nodes"; (etc, but we don't generalise to 3-d and beyond, yet). The nodes (i.e. bounds) and edges do not have an independent existence - they are elements of the CF cell definition. The new domain topology construct makes explicit the cell connectivities.

(1) Firstly, a UGRID mesh with multiple locations relates to multiple domain-axes in CF That can "work", because any data-variable can only reference one of them, as described :
we have no use case for two or more topology constructs, each of which applies to a single unique domain axis, and in fact we have no way of encoding it, so that case should indeed be excluded.
So, the CF 'topology construct' can only be atttached to a single domain axis of a domain. That means that UGRID data which maps to a different mesh location is modelled as belonging to a separate, independent "domain". This seems okay for now, but it means of course that the CF decsription has no concept of the intercoupled nature of the different locations.

Yes. A CF field construct contains a domain that is limited to describe just the parent field's data. Although, be careful not to confuse "domain axis" and "domain". A "domain axis" is essentially a dimension of the domain. We restrict the new domain topology constructs to apply to a single domain axis simply because there is no current way of encoding a domain topology construct that applies to multiple domain axes. UGRID only describes a mesh with a 1-d discrete axis. If this ever changed, it would be described by a simple generalisation of the data model text. The CF data model mustn't provide capabilities that are not allowed by the CF conventions.

If there are no data variables in a file - i.e. just the mesh is stored in a dataset - then the entire mesh definition is captured by a CF domain construct. However, if one or more a data variables are defined on the mesh, then their CF domains are only allowed to be those elements of the mesh that are in use. For example, if a temperature variable is stored on faces and a U-wind is stored at nodes, then the domain of the former will include the UGRID faces, edges and nodes, but the domain of the latter will only know about the nodes. In both cases, a connectivity matrix will retain the required connectivities.

I just wrote "If there are no data variables in a file - i.e. just the mesh is stored in a dataset - then the entire mesh definition is captured by a CF domain construct.", however I realise that that's not necessarily the case if there are edge coordinates as part of a mesh with faces. In this case, the edge coordinates can only be represented by the CF data model in a second domain that comprised 1-d cells defining each edge and how it's connected to others. I can't decide right now if this is a problem for the data model (i.e. should edge coordinates be a new feature of coordinate constructs?). The only issue I might have is that the "round trip" of reading a mesh variable into a CF domain construct and then writing it back to disk might not give an exactly comparable result to what you started with, but that isn't a promise of the data model, so perhaps not an issue? @JonathanGregory, it would be interesting to know your thoughts on this.

CF nor UGRID makes the promise that multiple data variables defined by the same mesh variable are guaranteed to be in some way combinable. That is a decision or assumption that has to be made by the user. Therefore, separate domain definitions for each data variable is an appropriate view for the CF data model. This becomes clear when you consider, say, multiple datasets from the same model simulation - each dataset contains the same mesh, but we only know it is the same by inspection or by the promise of non-standardised metadata.

When formal connections between data variables are possible in CF we'll need some carefully thought out extensions to the data model, but that's not something for this discussion (fortunately!).

If values on the nodes are treated as their own thing, you can plot them fine -- but if you need to interpolate the values, you need to know how the faces are defined.

If values are only defined at nodes, then the cells of the domain are defined by single points with a connectivity array that implicitly defines the edges and faces, so the data model is storing everything we need for interpolation.

"In CF-netCDF a domain topology can only be provided for a domain defined by a UGRID mesh topology variable"

Yes indeed - thanks for spotting my mistake.

All the best, David

ChrisBarker-NOAA commented 3 years ago

I think I'm (slowly) getting this, but a few thoughts on language:

"the CF data model is independent of the encoding, and the symmetric matrix is logically what is going on here."

That makes sense but the language I first noticed said a "symmetric array" -- I think we need to be careful about the abstraction of a symmetric matrix and the realization of an "array" in code, or a netcdf file, or ... And it needs to be clear that it's a abstract (sparse) boolean matrix, rather than an actual array (necessarily).

"The nodes (i.e. bounds) and edges do not have an independent existence - they are elements of the CF cell definition. The new domain topology construct makes explicit the cell connectivities."

That makes sense -- that is the whole point after all :-)

" A "domain axis" is essentially a dimension of the domain. We restrict the new domain topology constructs to apply to a single domain axis simply because there is no current way of encoding a domain topology construct that applies to multiple domain axes. UGRID only describes a mesh with a 1-d discrete axis."

Now I'm getting confused again -- I think this is (at least in my head) a result of a fundamental mismatch between orthogonal meshes and unstructured meshes. The whole concept of an "axis" matches orthogonal coordinates of some sort. (that's kind the definition of orthogonal, yes?). Mapping a Ugrid to the real world, the real world is 2D (Or 3D, but let's not go there yet) -- and that 2D world has 2 orthogonal axis -- X, Y, Lat, Long, whatever. So a mesh represents a 2D space -- but I wouldn't say it has two axes, or a 1-d discrete axis either, ir is simply something else. Or "axis" means something different in this context than I think it does.

"separate domain definitions for each data variable is an appropriate view for the CF data model. "

That means there could be separate definitions, or data variables could share the same one, yes? If, for instance, multiple variables in the same file are on the same mesh, they share a single domain definition, yes?

Thanks for taking the time to educate me!

davidhassell commented 3 years ago

Hi @ChrisBarker-NOAA,

The whole concept of an "axis" matches orthogonal coordinates of some sort. (that's kind the definition of orthogonal, yes?). Mapping a Ugrid to the real world, the real world is 2D (Or 3D, but let's not go there yet) -- and that 2D world has 2 orthogonal axis -- X, Y, Lat, Long, whatever. So a mesh represents a 2D space -- but I wouldn't say it has two axes, or a 1-d discrete axis either, ir is simply something else. Or "axis" means something different in this context than I think it does.

I hope I can clear this up a bit ... A "discrete axis" in CF is one which does not correspond to a continuous physical quantity, for example, this is the case for an axis that runs over ocean basins or area types, or for a domain axis that indexes a time series at scattered points. This also applies to the axis that stores the nodes of a UGRID mesh. In many of these cases the discrete axis is sampling a higher dimensional space, as you say. This where the "discrete sampling geometries" of chapter 9 get their name (I always imagine a ship sailing along a sinuous course and recording SSTs at various time intervals). A domain that has such a discrete domain axis construct can also have other domain axis contructs (such as ones for time, level, etc.) which are indeed orthogonal to each other and to the discrete axis.

That means there could be separate definitions, or data variables could share the same one, yes? If, for instance, multiple variables in the same file are on the same mesh, they share a single domain definition, yes?

Yes. Ish. Say we have two data variables in a file (temperature and precipitation, say) that are both defined on the faces of the same mesh. The CF data model currently views all field constructs (i.e. data variables in this context) as independent entities, so each of the resulting two field constructs would contain its own domain, and those domains can only, in the data model world, be seen to be equal by inspection. This is different to the ISO 19123 coverage model that allows for multiple "features" (i.e. data variables) to be defined at common locations (this is discussed section 5.1 of the CF data model GMD paper). CF has always had this feature (problem?), even with traditionally defined domains. We're back in SGRID territory here - and one day the CF conventions will formalise domain sharing and linked domains, but until then we're stuck with this theoretical independence. I don't think that this is in any way a blocker for the integration of UGRID into CF, though. Your software can explicitly share domains in the ISO 19123 vein if it wants to.

That makes sense but the language I first noticed said a "symmetric array" -- I think we need to be careful about the abstraction of a symmetric matrix and the realization of an "array" in code, or a netcdf file, or ... And it needs to be clear that it's a abstract (sparse) boolean matrix, rather than an actual array (necessarily).

I hear you, but would like to keep "array", as that is used in the same context throughout the data model description. I think we can improve things by explicitly noting the difference between logical representation and encoding in the text (changes in bold, I've also made a few other changes mentioned above). How's this?

Domain topology construct

A domain topology describes the connectivity of domain cells indexed by a subset of the domain axis constructs. When two cells are connected, operations on the data stored on them may be assumed to be continuous across their common boundary. A domain topology construct describes logically and explicitly the domain topology of cells indexed by a single domain axis construct. A domain topology construct contains a connectivity array that spans a single domain axis construct with the addition of an extra dimension of the same size, such that each dimension indexes the cells. The array is symmetrical, and each element indicates whether the pair of cells to which its indices refer are connected. The connectivity of a cell to itself is undefined, so the diagonal elements of this array are ignored. A domain construct may contain at most one domain topology construct.

For any subset of the domain axis constructs, excluding a domain axis construct for which there is a domain topology construct, there is an implicit domain topology that is defined by a function of the physical contiguousness of the cells, and/or the nature of the real world or simulated processes that produced the data. For example, in a field which contains both land and ocean cells, connections between land and ocean cells might be excluded for some physical processes.The description of such an implicit network topology may require metadata that is external to CF.

In CF-netCDF a domain topology can only be provided for a domain defined by a UGRID mesh topology variable. In this case, the connectivity array is supplied by a UGRID connectivity variable, such as a "face_face_connectivity" variable. Unlike the domain topology construct's connectivity array, a UGRID connectivity variable's data is not stored as a symmetric matrix that indicates the connectivity between any two cells. Instead, the trailing dimension of a UGRID connectivity variable's data records, for each cell, the indices of the other cells to which it is connected (padded with missing data if the cell has fewer connections than some others).

Thanks for taking the time to educate me!

Then it's very much a two-way street!

David

JonathanGregory commented 3 years ago

Dear Chris and David

Thanks for this discussion.

In David's revised text

Unlike the domain topology construct's connectivity array, a UGRID connectivity variable's data is not stored as a symmetric matrix that indicates the connectivity between any two cells. Instead, the trailing dimension of ...

I think that "unlike" implies it's somehow inconsistent. I would suggest something like, "The information represented by the symmetrical connectivity array of the domain topology construct in the CF data model is stored in a different but equivalent way in UGRID. The trailing dimension of ..."

David wrote, "If there are edge coordinates as part of a mesh with faces ... the edge coordinates can only be represented by the CF data model in a second domain that comprised 1-d cells defining each edge and how it's connected to others." What role do the edge coordinates have in the domain which refers to the faces?

Best wishes

Jonathan

davidhassell commented 3 years ago

Dear @JonathanGregory,

Thanks for your comments.

I think that "unlike" implies it's somehow inconsistent ...

Agreed. With your new text, that last paragraph becomes

In CF-netCDF a domain topology can only be provided for a domain defined by a UGRID mesh topology variable. In this case, the connectivity array is supplied by a UGRID connectivity variable, such as a "face_face_connectivity" variable. The information represented by the symmetrical connectivity array of the domain topology construct in the CF data model is stored in a different but equivalent way in UGRID. The trailing dimension of a UGRID connectivity variable's data records, for each cell, the indices of the other cells to which it is connected (padded with missing data if a cell has fewer connections than some others).

What role do the edge coordinates have in the domain which refers to the faces?

That sounds like the right question, and I think the answer is "none", which clarifies for me that my initial concern about "round-tripping" is not actually relevant, here. How a software implementation decides, when writing multiple fields to a file, to avoid the duplication and proliferation of domain-related netCDF variables is up to it, as it always has been.

All the best, David

ChrisBarker-NOAA commented 3 years ago

thanks @JonathanGregory: that is better.

Final note on "axis" -- I think I'm being pedantic here, and it's probably bad to introduce new terms, but:

David wrote:

A "discrete axis" in CF is one which does not correspond to a continuous physical quantity, for example, this is the case for an axis that runs over ocean basins or area types, or for a domain axis that indexes a time series at scattered points. This also applies to the axis that stores the nodes of a UGRID mesh.

Does it though?

In many of these cases the discrete axis is sampling a higher dimensional space, as you say. This where the "discrete sampling geometries" of chapter 9 get their name (I always imagine a ship sailing along a sinuous course and recording SSTs at various time intervals). A domain that has such a discrete domain axis construct can also have other domain axis constructs (such as ones for time, level, etc.) which are indeed orthogonal to each other and to the discrete axis.

In all of these examples, the axis may be "wandering around" in a higher dimensional space, but as noted, they are still orthogonal to others, and more critically, they represent a continuum of some sort. That is, where along the axis they lie is meaningful -- the fact that a value comes before or after another value, or is "next to" a value is meaningful.

But in the case of a unstructured grid's node array, for instance, the order in which the node coordinates are in the array is completely (well, not completely, but ...) arbitrary. It is a mapping between a node index and the coordinates of that node, nothing more -- it could be completely re-ordered without its meaning changing. Which is why I don't think of it as an axis, even if it technically fits the definition.

But as I said, probably good enough, and I can't think of a better term.

davidhassell commented 3 years ago

Hi @ChrisBarker-NOAA,

This puzzled me for a bit, until I remembered that in the CF data model, DSGs are not special. This is because what looks like a discrete axis in the encoding of a DSG is a really just a form of lossless packing for an orthogonal multidimensional array (with the data and coordinate arrays padded with missing data if required). This is indeed different to the UGRID mesh, as you rightly picked up on.

So bringing DSGs into this was misleading - sorry!

All that the CF data model says about DSGs is:

When a collection of discrete sampling geometry (DSG) features has been combined in a data variable using the incomplete orthogonal or ragged representations to save space, the axis size has to be inferred, but this is an aspect of unpacking the data, rather than its conceptual description. In practice, the unpacked data array may be dominated by missing values (as could occur, for example, if all features in a collection of time series had no common time coordinates), in which case it may be preferable to view the collection as if each DSG feature were a separate variable, each one corresponding to a different field construct.

In my other examples (such as data stored on ocean basins) there is no assumption of (spatial) continuity, so I reckon that the CF data model discrete axis does apply to the UGRID case, after all.

Thanks, David

JonathanGregory commented 3 years ago

Dear Chris and David

I agree that UGRID uses a discrete axis. There isn't necessarily any ordering implied by such an axis, just as in the case of UGRID. For example, ocean basins or other geographical regions may be arranged along a discrete axis in any order.

Best wishes

Jonathan

ChrisBarker-NOAA commented 3 years ago

Thanks guys, i think I get in now -- the ocean basins example is a good one, I didn't really know what that meant the first time it was mentioned. "axis" it is :-)

davidhassell commented 2 years ago

Hello all,

I have created a couple of pull requests to hopefully finally get UGRID into CF. I've not consulted anyone on the new text yet, so I fully expect some constructive comments! but I thought it a good idea to have something that we can discuss in less abstract terms.

I have copied the agreed procedure from previous comments, and mapped them to new parts of the pull requests (one here and one over on https://github.com/cf-convention/cf-convention.github.io).

Thanks, David

Procedure for incorporating UGRID into CF

UGRID will remain an independent convention, with independent governance.

Agreed.

Comprehensive conformance rules will be written up for UGRID.

These will be maintained alongside UGRID in its repository, and referenced from (not copied into) the CF conformance document.

See PR #353 (conformance). The link to the UGRID conformance is a placeholder, as they are still being written.

The rules governing the evolution of CF will be modified to ensure that the CF conventions remain consistent with URGID.

Whilst UGRID will not be formally constrained by the CF conventions, it will be in everyone’s interest for UGRID changes to be evaluated for compatibility with CF, and vice versa. If desirable changes in one convention would require changes in the other convention, then such changes should be proposed to the other community and discussed together as part of a shared interest. Different time scales of evolution between the two standards are accounted for, as CF will only formally accept a named version (or versions) of UGRID and so is not necessarily obliged to support the latest version of UGRID (although that would generally be the expectation). In the very unlikely event that changes to UGRID can not be accepted by CF then it would be the time to merge the last set of accepted UGRID conventions into the actual CF document, from where the CF representation of unstructured grids would evolve independently to UGRID.

See PR https://github.com/cf-convention/cf-convention.github.io/pull/210 (Additional recommendations relating to UGRID)

A subsection will be added to CF section 1: Introduction to introduce UGRID and its purpose, to make clear its special synergy with CF, to remark on the appearance of attributes in CF appendices, and to say that it has its own conformance document which complements the CF conformance document.

See PR #353 (section 1)

The standardised UGRID attributes will be documented in the CF conventions, thereby making them visible to all users, and it will be mentioned in the CF governance rules that they need maintaining.

Only the UGRID attributes which can appear on data variables (currently mesh, location and location_index_set) should be added to CF appendix A: Attributes. All other UGRID attributes (such as those on mesh variables) should go into a new CF appendix specifically about the UGRID mesh topology variable. This would be like the treatment of the attributes of the grid mapping variable, which are in a table in CF appendix F: Grid Mappings, not in appendix A. (It is acknowledged that the geometry variable attributes appear in CF appendix A, but there are only five of them, whereas there are eighteen attributes of the mesh topology variable.)

See PR #353 (appendix A, appendix K)

Some text will be added to CF section 5.8: Domain Variables to explain the UGRID mesh topology variable and how it relates to a CF domain variable. It may be the case that the occasional note relating to UGRID would be useful in other sections.

See PR #353 (section 5). I took the approach of creating a new section (5.9 Mesh Topology Variables) instead, which references the CF domain.

It will be decided whether or not UGRID fits into the existing CF data model (defined in CF appendix I: The CF Data Model), and if not then the CF data model will be extended to accommodate UGRID.

The understanding of interested parties from both the UGRID and CF communities coalesced on the need for a new CF data model construct that can make explicit the notion of a network topology for CF domain constructs.

See PR #353 (appendix I)

JonathanGregory commented 2 years ago

Dear @davidhassell

Thanks for summarising the discussion up to now and writing the pull requests. As far as I know, this covers everything. The proposed changes look fine to me from a CF point of view. I noticed three small things:

In Appendix I (the data model), you describe the new CF-netCDF element as "Domain(s) with cell connectivity" and the new CF construct as "Connectivity of domain cells". I wonder if these should be the same.
In ch01 there is an accidental "accidential".
I note that many of the changes in https://github.com/cf-convention/cf-convention.github.io/pull/210/files are concerned with respelling github as GitHub. It's fine to put that right. In the unlikely event that this PR doesn't get accepted, we should remember to do that anyway.

Cheers

Jonathan

davidhassell commented 2 years ago

Dear Jonathan,

Thank you for your comments.

In Appendix I (the data model), you describe the new CF-netCDF element as "Domain(s) with cell connectivity" and the new CF construct as "Connectivity of domain cells". I wonder if these should be the same.

Good point, In this case I think they should different, though.

For instance, in the first new example in the new section 5.9 "Mesh Topology Variables" (which is adapted from https://ugrid-conventions.github.io/ugrid-conventions/#2d-triangular-mesh-topology) , the mesh topology variable Mesh2 contains nodes, edges and faces, each of which corresponds to a self-contained CF data model domain construct. I.e. this mesh topology variable represents three domains (in the data model sense of "domain"):

(one domain for the 2 triangular faces, one for the 5 edges, and one for the 4 nodes.)

By contrast, in the data model we still do not formally recognise the inter-domain connectivity (e.g. the relationship between edges and faces; or the relationship between stagger locations on an Arakawa grid), but the existing domain construct does require the new domain topology construct component to represent the intra-domain connectivity required by UGRID.

I feel that this distinction should be better illuminated in the appendix I text ... I'll see how I can work that in.

In ch01 there is an accidental "accidential".

Fixed (I'll push it up with the appendix I changes discussed above).

I note that many of the changes in https://github.com/cf-convention/cf-convention.github.io/pull/210/files are concerned with respelling github as GitHub. It's fine to put that right. In the unlikely event that this PR doesn't get accepted, we should remember to do that anyway.

You're quite right. (I shouldn't really do that, but I got carried away during a spell check :))

By the way, I very recently found out that GitHub now has a nice image comparison feature when you do a "rich diff" on an image file, that includes side by side, and overlayed views (e.g. https://github.com/cf-convention/cf-conventions/pull/353/files?short_path=5a70e4b#diff-5a70e4b347e2935d49b12cfaff78556fb70319f62762e966b1c82042f022d463 and https://github.com/cf-convention/cf-conventions/pull/353/files?short_path=e4e016c#diff-e4e016c9bc64bd9cea792809a49d188836d6050352f0c24da156b896f8a3b6cf).

davidhassell commented 2 years ago

Dear Jonathan,

I have made some commits that hopefully make clearer the relationship between CF-netCDF mesh topology and location index variables, and CF data model Domain and Domain Topology constructs. What do you think?

https://github.com/cf-convention/cf-conventions/commit/3f9b4d53421f7da8c380c7e489e7713140edc60a?short_path=84395ff#diff-84395ffc95113b0c4c0cd22056e9b7461dc81bec9c89422b66c7b14f73d6c788

https://github.com/cf-convention/cf-conventions/commit/c401fae05d15f4aba7df1a2fe69163a8f00d389e?short_path=e4e016c#diff-e4e016c9bc64bd9cea792809a49d188836d6050352f0c24da156b896f8a3b6cf

JonathanGregory commented 2 years ago

Those changes look fine to me, thanks, David. The clarification is useful.

JonathanGregory commented 2 years ago

I believe that pull request #353 in this repo and pull request #210 in the website/governance repo can be merged, which make pointers from CF to UGRID. There was sufficient support according to the rules expressed when they were written early this year by @davidhassell and no remaining objections. I propose to merge them today.

davidhassell commented 2 years ago

Dear Jonathan,

I would very much like to do this, but there are perhaps a some loose ends: There has been no blessing, or otherwise, on the text from the UGRID folk (although the approach has been agreed), and the UGRID conformance document has not yet been completed (see https://github.com/ugrid-conventions/ugrid-conventions/issues/52#issuecomment-1158620865).

The first of these is not so much a problem, given that it's just an implementation of an agreed strategy and from the CF end we are happy with what we have. But what about the conformance? Can we proceed without it?

JonathanGregory commented 2 years ago

Dear @davidhassell

Thanks for pointing that out and restraining my zeal. I don't think we can proceed without the UGRID conformance document being decided. @ChrisBarker-NOAA, are you able to help or advise us?

Best wishes

Jonathan

ChrisBarker-NOAA commented 2 years ago

I took (an admittedly very quick) look at the PRs -- and it LGTM.

One comment though: Appendix K, Mesh Topology Attributes. seems to be duplicating some of what's in the UGRID spec, e.g. defining things like "face_face_connectivity". I see this has been explained with:

"However, a summary with examples can be found in <>, and to reduce the chance of ambiguities arising from their accidental re-use, all of the UGRID standardized attributes are specified in <> and <>."

which makes sense -- but if this is intended to be comprehensive, has it been reviewed for completeness?

As for the UGRID conformance document -- I think it's OK to have it as a draft, but we should at least publish it -- I've pinged that PR to see if we can get it done.

davidhassell commented 2 years ago

Hi Chris,

Thanks for taking a look!

which makes sense -- but if this [Appendix K] is intended to be comprehensive, has it been reviewed for completeness?

It is intended to be comprehensive. I don't know if has been reviewed for completeness, although Jonathan may have done so during his review of the PR and not mentioned it because it was found to be OK :) I will give it another check. Would you like to cast your eyes over it, Chris?

AS for keeping it up to date, well https://github.com/cf-convention/cf-convention.github.io/pull/210 should ensure that this happens (although humans are involved!)

As for the UGRID conformance document -- I think it's OK to have it as a draft, but we should at least publish it -- I've pinged that PR to see if we can get it done.

I would be happy proceeding on this basis. Someone tell me if that's too relaxed!

jypeter commented 1 year ago

I have just heard about UGRID this morning, and I have installed uxarray in my shared Python distribution (just in case). I then went to the CF website to check if there was a link to UGRID somewhere, but that's not the case yet

I confess I don't want to read the full discussion here, but it naively seems to me that it would not hurt to have a link to UGRID somewhere. There is currently (I'm looking at version 1.10, 31 August, 2022) no hit when you look for unstructured in the documentation! And you can find structured 4 times, but not in grid related topics.

I have just checked, and there is nothing the the current draft version

ChrisBarker-NOAA commented 1 year ago

@ jypeterL thanks for the ping -- yes, we really need to get ha done!

Note: xarray doesn't "understand" UGRID -- though there is the https://github.com/UXARRAY/uxarray pacakge which is working on that.

And there is gridded (https://github.com/NOAA-ORR-ERD/gridded) which does work with UGRID, but doesn't use xarray (it could and should -- PR's accepted :-) )

trexfeathers commented 1 year ago

And there is gridded (https://github.com/NOAA-ORR-ERD/gridded) which does work with UGRID, but doesn't use xarray (it could and should -- PR's accepted :-) )

Shout out for Iris, while we're on the subject 🙂

ChrisBarker-NOAA commented 1 year ago

Does Iris have support for unstructured grids now?

davidhassell commented 1 year ago

Hi @jypeter, all of the references to UGRID are waiting to be merged into the latest document from PR #353 (TLDR: here and here) . We could merge this today if the draft UGRID conformance document were already published ...

... if this still hasn't happened by the time CF-1.11 is due to be released (which is likely to be August/September 2023), then I will push for including UGRID into CF without the UGRID conformance, which is less than ideal but I think necessary for data creators and software providers.

Whilst we're shouting out, cf-python and cfdm will support UGRID by the time of CF-1.11.

Thanks, David

pp-mo commented 1 year ago

Does Iris have support for unstructured grids now?

Yes, it absolutely does.
Enhanced the internal datamodel, and added several mesh-specific docs pages : https://scitools-iris.readthedocs.io/en/stable/further_topics/ugrid/index.html#ugrid

Here @MetOffice we put a lot of work into this 2021 as it's projected to have extensive internal use. But real usage is actually only just beginning + API still experimental -- so all feedback welcome!

davidhassell commented 1 year ago

Hi, I realise that we need a little bit of extra text in the CF data model to account for the case that we have cell bounds but no representative coordinates. This would occur in the following case, for which the data is defined on faces, but the mesh only defines face vertices:

dimensions:
  nMesh2_node = 4 ; // nNodes
  nMesh2_face = 2 ; // nFaces
  Three = 3 ;
variables:
// Mesh topology
integer Mesh2 ;
    Mesh2:cf_role = "mesh_topology" ;
    Mesh2:long_name = "Topology data of 2D unstructured mesh" ;
    Mesh2:topology_dimension = 2 ;
    Mesh2:node_coordinates = "Mesh2_node_x Mesh2_node_y" ;
    Mesh2:face_node_connectivity = "Mesh2_face_nodes" ;
// Mesh node coordinates
double Mesh2_node_x(nMesh2_node) ;
    Mesh2_node_x:standard_name = "longitude" ;
    Mesh2_node_x:long_name = "Longitude of 2D mesh nodes." ;
    Mesh2_node_x:units = "degrees_east" ;
double Mesh2_node_y(nMesh2_node) ;
    Mesh2_node_y:standard_name = "latitude" ;
    Mesh2_node_y:long_name = "Latitude of 2D mesh nodes." ;
    Mesh2_node_y:units = "degrees_north" ;
// Mesh node connectivity
integer Mesh2_face_nodes(nMesh2_face, Three) ;
    Mesh2_face_nodes:cf_role = "face_node_connectivity" ;
    Mesh2_face_nodes:long_name = "Maps every triangular face to its three corner nodes." ;
    Mesh2_face_nodes:start_index = 1 ;
// Data
float tas(nMesh2_face) ;
    tas:mesh = "Mesh2" ;
    tas:location = "face" ;
    tas:units = "K" ;

The existing text says that the only way to encode cell bounds without coordinates is with simple geometries:

A CF-netCDF geometry container variable is used to store cell bounds without coordinates for a discrete axis.

This text should be modifed to:

A CF-netCDF geometry container variable or mesh topology variable is used to store cell bounds without coordinates for a discrete axis.

I'll update the pull request with this small change, pending further discussion.

Thanks, David

JonathanGregory commented 1 year ago

Dear @davidhassell

Thanks for noting this. In the current text of the data model section (at the link you helpfully provided) I wonder whether we have not distinguished sufficiently between the CF-netCDF encoding and the data model. We say, "A coordinate construct consists of an optional data array of the coordinate values spanning the subset of the domain axis constructs ...", and then, "The data array of the coordinate values is required, execpt for the special cases described below." The special cases are the two (formerly one) you have identified, if I understand correctly i.e. geometry container variable or mesh topology variable.

Those are two cases of encoding in netCDF. We do not logically distinguish simple geometry cells or UGRID cells from other kinds of cell. (UGRID has a domain topology construct in the data model, which contains extra information about cell connectivity, but it still has cells.) So I wonder if that, as far as the data model is concerned, the data array is unconditionally optional. We could replace the "execpt" sentence quoted above with something like: "Although the data array of the coordinate values is optional in the data model, it is mandatory in CF-netCDF, with two exceptions: simple geometries, and UGRID cells described by a mesh topology variable."

Cheers

Jonathan

ChrisBarker-NOAA commented 1 year ago

So I wonder if that, as far as the data model is concerned, the data array is unconditionally optional.

Is it? Or is it optional in the data model? Don't you need SOME way of specifying the location of the data at hand?

davidhassell commented 1 year ago

I think that @ChrisBarker-NOAA is right - we do need some way of specifying the cell locations, and that can be with either coordinates (C), or bounds (B), or both coordinates and bounds. I.e. omitting both coordinates and bounds is disallowed, but any other combination is OK.

Other constructs, such as domain ancillary constructs, insist on having a data array present. So it makes sense that we insist on that for coordinate constructs - but noting that the data can be provided in one of three ways (i.e. C, B, or C & B).

Then also note, as @JonathanGregory said, "Although the data array of the coordinate values is optional in the data model, it is mandatory in CF-netCDF, with two exceptions: simple geometries, and UGRID cells described by a mesh topology variable."

ChrisBarker-NOAA commented 1 year ago

@JonathanGregory wrote:

So I wonder if that, as far as the data model is concerned, the data array is unconditionally optional.

Maybe getting sidetracked here, but why is it unconditionally optional in the data model? Isn't a variable with no specification of its location kind of useless?

But I may be getting confused about what exactly is being talked about.

JonathanGregory commented 1 year ago

Dear @davidhassell and @ChrisBarker-NOAA

I can't think of an existing case where you would use a coordinate construct with neither coordinates nor bounds, so I agree with David that we must have one or the other although they are individually optional. So I was wrong to suggest that the coordinates are unconditionally optional. The first paragraph of the coordinate constructs in the data model could be something like this:

Coordinate constructs (Figure I.3) provide information which locate the cells of the domain and which depend on a subset of the domain axis constructs. A coordinate construct consists of an ~optional~ data array of the coordinate values spanning the subset of the domain axis constructs, properties to describe the coordinates (in the same sense as for the field construct), an ~optional~ data array of cell bounds recording the extents of each cell, and any extra arrays needed to interpret the cell bounds values. In the data model, all the components of the coordinate construct are optional, but it is mandatory to include either the coordinate array or the bounds array, and both may be included. In CF-netCDF, the coordinate array is mandatory, except for simple geometries and UGRID cells described by mesh topology variables.

Best wishes

Jonathan

davidhassell commented 1 year ago

Hello,

Since the beginning of May there has been a lot of discussion on this topic over at https://github.com/ugrid-conventions/ugrid-conventions/issues/65 and https://github.com/ugrid-conventions/ugrid-conventions/issues/66 and off-line - many thanks to @ChrisBarker-NOAA, @JonathanGregory, @hrajagers, @pp-mo and @drf5n for taking the time to think about this.

The result of all this is that we've had to change the nature of the CF data model Domain Topology construct, and need to make a note about "boundary_node_connectivity". None of this changes the agreed principles and approach of incorporating UGRID into CF.

It is very desirable to get this into CF-1.11 which will be released later this year (potentially as soon as the end of September), not least because a variety of general circulation models that need to archive in CF-netCDF are now starting to use UGRID to store their outputs.

Domain Topology construct description

Here is the new description of the Domain Topology construct (see https://github.com/cf-convention/cf-conventions/issues/153#issuecomment-888189611 for the original description):

Domain topology construct

A domain topology construct describes logically and explicitly the contiguity of domain cells indexed by a single domain axis construct, where two cells are described as contiguous if and only if they share at least one common boundary vertex. A domain construct allows contiguity to be ascertained without comparison of boundary vertices, which may be co-located for non-contiguous cells. A domain construct may contain at most one domain topology construct.

A domain topology construct contains an array that spans a single domain axis construct with the addition of an extra dimension that indexes the cell bounds for the corresponding coordinates. Identical array values indicate that the corresponding cell vertices map to the same node of the domain, but otherwise the array values are arbitrary.

_In CF-netCDF a domain topology can only be provided for a domain defined by a UGRID mesh topology variable, supplied by a node connectivity variable, such as is named by a "face_nodeconnectivity" attribute. The indices contained in a node connectivity variable may be used directly to create a domain topology construct but the CF data model attaches no significance to the values, other than the fact that not all indices are the same.

The old version described a Boolean array which indicated which pairs of cells were contiguous, whilst the new version is an array that that has the exact form of a UGRID node connectivity variable.

The old version seemed at the time like a clear abstraction of what is going on, but there were still some subtleties which hadn't been considered (such as co-located but non-contiguous cells); and it turned out that with the old version it was not always possible to make the round trip from UGRID dataset -> CF data model constructs -> UGRID dataset. The first step was OK, but the second step still relied on inspection of cell bounds and examples were found which did not work, even given unambiguous bounds comparisons.

Edit: sent too soon - carrying on the next post!

davidhassell commented 1 year ago

... carrying on from the last post!

Boundary Node Connectivity

The UGRID boundary_node_connectivity attribute provides metadata towards describing the boundary conditions that constrain the data, and which are distributed over the same sampling domain as the field itself. The boundary conditions provide information on the processes that produced the data, and may inform operations on the the continuity of the data across them. Note that such boundary conditions are constrained by the domain definition, but does not contribute to the domain's definition.

I propose to explicitly exclude Boundary Node Connectivity from being recognised by CF, given no current use cases for it, for the following reasons:

Practically, UGRID currently lacks a mechanism for associating the variable containing the nature of the boundary conditions with the variable containing their locations.
The nature of the boundary conditions is part of the model formulation. Whilst the formulation is clearly important, CF transcends this: it says "yes, you could compare/combine these two datasets and get a meaningful result", but that the interpretation of that result is a function of model formulation, experimental design, instrument type, etc., and these are up to the user to determine outside of CF. In this feature, UGRID is geared towards storing information needed to configure a model that uses the grid (https://github.com/ugrid-conventions/ugrid-conventions/issues/65#issuecomment-1520716314). I think that this is beyond CF at this time (but is clearly an area that could be explored later).
The current UGRID specification only allows for interface conditions along (N-1)d cell interfaces of Nd cells (e.g. edges of faces), but it is already being considered to extend this to interfaces at nodes of any cell type (edge/face/volume); at volume edges; and at volume faces. As this feature is under development, CF should consider incorporating it into the data model (for it would require a new data model construct) when it is more complete.

To that end, I suggest adding this new text to the new "Mesh topology" section of chapter 5 (see https://github.com/cf-convention/cf-conventions/pull/353/files#diff-3c189abe47ef902923e4a6126a2fe909ed568bcacae933778144094935c0a9d8 for the existing changes):

_The UGRID conventions \<\<UGRID>> allow for the specification of boundary conditions that applied to the creation of the data, via the boundary_node_connectivity attribute, but this feature is not included in this version of CF._

Also, modifying the mention of boundary_node_connectivity from Appendix K (https://github.com/cf-convention/cf-conventions/pull/353/files#diff-d67e5a9a0f7dc06129dad9631f241b99f4fa8d962d5e81a15c2bb0776149f745).

| **`boundary_node_connectivity`**
| S
| MT
| Specifies an index variable identifying the nodes that each boundary element
(i.e. the nodes that define each edge of a face, or the nodes that define each face of a volume).
This attribute is not recognized by this version of CF.

davidhassell commented 1 year ago

... and lastly ...

If you could cast an eye over these changes (which are pretty small, once you strip out all of the recorded thinking!) that would be great help. I'll rework the PR (#353) only when the proposed changes are OK with all.

Thank you!

davidhassell commented 1 year ago

@ChrisBarker-NOAA, is it indeed the case that data defined at nodes (which do not have bounds) are considered to be connected to the other nodes defined by whichever [edge|face|volume]_node_connectivity variables are present? If so, then the new Domain Topology construct definition does not work for this case, although the old definition would ....

davidhassell commented 1 year ago

Perhaps "contiguity" and "connectedness" are different concepts for which we need two CF data model constructs? The case I mentioned above maybe feels more "connected" than "contiguous".

JonathanGregory commented 1 year ago

Dear @davidhassell

Thanks for your work on this and the new text. Some aspect of it are unclear to me. They probably are clear given knowledge of the UGRID spec, which unfortunately I have forgotten! But the CF data model text should not depend on knowledge of the UGRID spec.

What does "which may be co-located for non-contiguous cells" mean? If two vertices are co-located - effectively identical - surely the cells which have these vertices are contiguous? Similarly, what are "co-located cells", mentioned later? Does this mean two cells which occupy exactly the same space (of the appropriate dimensionality: space, face, line or vertex)?
"an extra dimension that indexes the cell bounds for the corresponding coordinates". What are the "corresponding coordinates"? Does this mean the cells? Is the index over all the bounds i.e. any cell can refer to any bound (so this dimension is at least twice as large as the dimension of the domain axis), or is it a ragged array in which each cell refers to its own bounds only?
What is a "node of the domain"?

Excluding the boundary node connectivity from CF for the moment seems reasonable to me, since the definition is fluid. This information might later be treated as data in CF, or perhaps ancillary variables, rather than metadata of the domain.

Best wishes

Jonathan

davidhassell commented 1 year ago

Hello,

Since July, Chris Barker, Jonathan Gregory and myself have had many off-list discussions on getting UGRID within CF, and the three of us are now happy that we now have a data model extension which properly works for UGRID, with no unaccounted for corner cases.

I shall close the original PR and open a new one (#459) that contains the full integration of this new model into the conventions text, but for ease of reference, the new data model Domain topology and Cell connectivity constructs that we have devised are reproduced in this message.

We are considering the 3 week cooling-off period for merging this change as starting from now, but of course welcome feedback of any sort in the usual manner, and if that exposes anything we've missed and the clock is reset, so much the better!

All the best, David

Domain topology construct

A domain topology construct defines the geospatial topology of cells arranged in two or three dimensions in real space but indexed by a single (discrete) domain axis construct, and at most one domain topology construct may be associated with any such domain axis. The topology describes topological relationships between the cells - spatial relationships which do not depend on the cell locations - and is represented by an undirected graph, i.e. a mesh in which pairs of nodes are connected by links. Each node has a unique arbitrary identity that is independent of its spatial location, and different nodes may be spatially co-located.

The topology may only describe cells that have a common spatial dimensionality, one of:

Point: A point is zero-dimensional and has no boundary vertices.
Edge: An edge is one-dimensional and corresponds to a line connecting two boundary vertices.
Face: A face is two-dimensional and corresponds to a surface enclosed by a set of edges.

Each type of cell implies a restricted topology for which only some kinds of mesh are allowed. For point cells, every node corresponds to exactly one cell; and two cells have a topological relationship if and only if their nodes are connected by a mesh link. For edge and face cells, every node corresponds to a boundary vertex of a cell; the same node can represent vertices in multiple cells; every link in the mesh connects two cell boundary vertices; and two cells have a topological relationship if and only if they share at least one node.

mesh_figure Figure I.5 A topology defined by a mesh with five nodes and six links.

For example, the mesh depicted in Figure I.5 may be used with any of three domain topology constructs for domains comprising two face cells (one triangle and one quadrilateral), six edge cells, and five point cells respectively.

A domain topology construct contains an array defining the mesh, and properties to describe it. There must be a property indicating the spatial dimensionality of the cells. The array values comprise the node identities, and all array elements that refer to the same node must contain the same value, which must differ from any other value in the array. The array spans the domain axis construct and also has a ragged dimension, whose function depends on the spatial dimensionality of the cells.

For each point cell, the first element along the ragged dimension contains the node identity of the cell, and the following elements contain in arbitrary order the identities of all the cells to which it is connected by a mesh link.

For each edge or face cell, the elements along the ragged dimension contain the node identities of the boundary vertices of the cell, in the same order that the boundary vertices are stored by the auxiliary coordinate constructs. Each boundary vertex except the last is connected by a mesh link to the next vertex along the ragged dimension, and the last vertex is connected to the first.

When a domain topology construct is present it is considered to be definitive and must be used in preference to the topology implied by inspection of any other constructs, which is not guaranteed to be the same.

In CF-netCDF a domain topology construct can only be provided for a UGRID mesh topology variable. The information in the construct array is supplied by the UGRID "edge_nodes_connectivity" variable (for edge cells) or "face_nodes_connectivity" variable (for face cells). The topology for node cells may be provided by any of these three UGRID variables. The integer indices contained in the UGRID variable may be used as the mesh node identities, although the CF data model attaches no significance to the values other than the fact that some values are the same as others. The spatial dimensionality property is provided by the "location" attribute of a variable that references the UGRID mesh topology variable, i.e. a data variable or a UGRID location index set variable.

A single UGRID mesh topology defines multiple domain constructs and defines how they relate to each other. For instance, when "face_node_connectivity" and "edge_node_connectivity" variables are both present there are three implied domain constructs - one each for face, edge and point cells - all of which have the same mesh and so are explicitly linked (e.g. it is known which edge cells define each face cell). The CF data model has no mechanism for explicitly recording such relationships between multiple domain constructs, however whether or not two domains have the same mesh may be reliably deternined by inspection, thereby allowing the creation of netCDF datasets containing UGRID mesh topology variables.

The restrictions on the type of mesh that may be used with a given cell spatial dimensionality excludes some meshes which can be described by an undirected graph, but is consistent with UGRID encoding within CF-netCDF. UGRID also describes meshes for three-dimensional volume cells that correspond to a volume enclosed by a set of faces, but how the nodes relate to volume boundary vertices is undefined and so volume cells are currently omitted from the CF data model.

Cell connectivity construct

A cell connectivity construct defines explicitly how cells arranged in two or three dimensions in real space but indexed by a single domain (discrete) axis are connected. Connectivity can only be provided when the domain axis construct also has a domain topology construct, and two cells can only be connected if they also have a topological relationship. For instance, the connectivity of two-dimensional face cells could be characterised by whether or not they have shared edges, where the edges are defined by connected nodes of the domain topology construct.

The cell connectivity construct consists of an array recording the connectivity, and properties to describe the data. There must be a property indicating the condition by which the connectivity is derived from the domain topology. The array spans the domain axis construct with the addition of a ragged dimension. For each cell, the first element along the ragged dimension contains the unique identity of the cell, and the following elements contain in arbitrary order the identities of all the other cells to which the cell is connected. Note that the connectivity array for point cells is, by definition, equivalent to the array of the domain topology construct.

When cell connectivity constructs are present they are considered to be definitive and must be used in preference to the connectivities implied by inspection of any other constructs, apart from the domain topology construct, which are not guaranteed to be the same.

In CF-netCDF a cell topology construct can only be provided by a UGRID mesh topology variable. The construct array is supplied either indirectly by any of the UGRID variables that are used to define a domain topology construct, or directly by the UGRID "face_face_connectivity" variable (for face cells). In the direct case, the integer indices contained in the UGRID variable may be used as the cell identities, although the CF data model attaches no significance to the values other than the fact that some values are the same as others.

Restricting the types of connectivity to those implied by the geospatial topology of the cells precludes connectivity derived from any other sources, but is consistent with UGRID encoding within CF-netCDF.

davidhassell commented 1 year ago

Hello

We are considering the 3 week cooling-off period for merging this change as starting from now, but of course welcome feedback of any sort in the usual manner, and if that exposes anything we've missed and the clock is reset, so much the better!

Just a reminder that these changes will be merged in a couple of weeks unless any non-editorial questions are raised, so please do have a look at the PR #459 if you are interested.

Thanks, David

JonathanGregory commented 1 year ago

Dear David

Thanks very much. I agree with the contents. I have two small editorial suggestions:

I stumbled on "however whatever". I suggest deleting "however" and starting a new sentence with "Whatever".
In the sentence

When cell connectivity constructs are present they are considered to be definitive and must be used in preference to the connectivities implied by inspection of any other constructs, apart from the domain topology construct, which are not guaranteed to be the same.

it's unclear what "which" refers to. I suggest

When cell connectivity constructs are present they are considered to define the connectivity of the cells. Exactly the same connectivity information could be derived from the domain topology construct. Connectivity information inferred from inspection of any other constructs is not guaranteed to be the same.

Cheers

Jonathan

taylor13 commented 1 year ago

When you edit, you might insert a comma after the opening clause in the first sentence of the above quoted text i.e., "When cell connectivity constructs are present, they ...."

davidhassell commented 1 year ago

Thanks @JonathanGregory and @taylor13, I have included your suggestions in PR #459.

davidhassell commented 1 year ago

Hello - just a reminder that PR #459 is due to be merged in two days time, along with the PR that updates the rules: https://github.com/cf-convention/cf-convention.github.io/pull/210/files

sadielbartholomew commented 1 year ago

I'm clearly quite late to the party here, but I have been carefully reading through the Issue here and the connected PR, and I wanted to register my support for this. I've also reviewed the PR #459 from a high-level perspective and it seems very sensible to me.

I particularly think it is good that we are not hard-coding aspects of the UGRID conventions into the CF Conventions, but instead referencing them, to loosen the coupling and making it difficult for the two standards can get out-of-sync; and that the data models for both have been considered at the forefront for establishing and maintaining a formal association.

I raised a few questions as comments on the PR, https://github.com/cf-convention/cf-conventions/pull/459#pullrequestreview-1671888194 which occurred to me, but on further reading here might have been answered already in the above thread (there are quite a lot of comments to work through!).

davidhassell commented 1 year ago

Many thanks for your review, @sadielbartholomew - there's a lot of stuff here to wade through!

In response to your good point about the Conventions attribute, I have added a line to section 2.6.1:

"The UGRID conventions, which are fully incorporated into the CF conventions, do not need to be included in the Conventions attribute. "

This was always accepted as the case, as you also point out, so I don't think we need to reset the clock for this addition.

Previous Next