Reference UGRID conventions in CF

rsignell-usgs commented 5 years ago

As discussed in Trac ticket 171 we would like to associate a specific version of UGRID with each version of CF.

We propose to simply add a section 1.5 to the Conventions Document called "Relationship to the UGRID Conventions" which would say:

UGRID is a convention for unstructured (e.g. triangular) grids that supplements the CF Conventions, including specification of grid topology and location of data on grid elements. Each version of CF is associated with a particular version of UGRID through the Conventions attribute in 2.6.1.

Then in Section 2.6.1, modify the beginning to read:

We recommend that netCDF files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.8" which also implies "UGRID-1.0".

davidhassell commented 5 years ago

I support this change. The aim (mentioned on the Trac ticket) of "ensuring they [CF and UGRID] remain consistent and complementary" is a very good goal. A couple of questions:

How will UGRID ensure that it is up to date with CF? I presume that this would not be the responsibility of proposers of changes to CF. And vice versa.

What would be the policy of matching versions of CF and UGRID? CF-1.8 and UGRID-1.0 are proposed for now - when CF-1.9 arises, would the latest UGRID version be specified by default, or would it it stay at 1.0?

Thanks, David

rsignell-usgs commented 5 years ago

@davidhassell thanks for the comments. I think this would be worked out on a case-by-base basis with joint approval by the CF and UGRID Governance committees.

davidhassell commented 5 years ago

OK - in that case, I think an explicit addition to the governance rules (https://github.com/cf-convention/cf-convention.github.io/blob/master/rules.md) would be useful.

I've been thinking about the similar issue that arises when considering if a change to CF is compatible with the CF data model, for which I will soon be suggesting these additional rules (just think "UGRID" instead of "CF data model"):

* All new proposals will be assessed to see if the new features defined in the proposal map onto the CF data model.

* The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it.

For the CF data model case, there are more rules than this, about what to do depending on the result of the assessment - which I'm happy to share - but you get the drift.

UGRID also has the issue (unlike the CF data model) that it may change independently of CF, so there needs to be some sort of a symmetry in these rules, too.

These rules may seem at bit over the top, but they're not really more draconian than the existing rules for changes to CF, and only a very few people will ever need to worry about them, i.e. the UGRID (or CF data model) experts, and it gives those people a checklist to make sure that we don't wander off piste.

Thanks, David

rsignell-usgs commented 5 years ago

@davidhassell, that sounds reasonable. It's also a reminder to us at UGRID that we need to add rules similar to https://github.com/cf-convention/cf-convention.github.io/blob/master/rules.md. I've raised the issue here.

davidhassell commented 5 years ago

I think that the biggest issue concerning versioning is that every enhancement to CF needs to be checked for UGRID inconsistencies, and every enhancement to UGRID needs to be checked for CF inconsistencies.

How much a proposer of a change needs to know about this is up for discussion.

For the CF data model, I have suggested (cf-convention/cf-convention.github.io@a170505) that the proposer does not need to know about data model issues - the default someone will look at the proposal on their behalf and decide if any action needs to be taken. Action could mean changing the proposal or changing the data model. This approach could also work for how CF changes might affect UGRID, and vice versa, but there might be alternative approaches that are better suited to the UGRID (and CF data model) case - I would welcome any thoughts on this.

For the most part, I imagine the no action would be required - all 28 tickets that contributed to CF-1.7 had no impact on the CF data model.

That said, there are two issues for CF-1.8 that will require changes to the data model - geometries and UGRID. These changes have been worked out and are backwards compatible (more on that another time), so all is well! Such structurally challenging changes are unusual, though.

Thanks, David

davidhassell commented 4 years ago

Hello,

I have been looking at the Finite Element based CF proposal for Unstructured Grid data model (https://publicwiki.deltares.nl/display/NETCDF/Finite+Element+based+CF+proposal+for+Unstructured+Grid+data+model) which was written up some time ago by Bert. This proposes an encoding for the information required for a consistent spatial interpretation of the values.

Given that UGRID is going to be incorporated into CF [1], I was looking at this to see if backward incompatible changes could occur if this new proposal (or something like it) become part of UGRID at a later date.

[1] it seems like this will indeed happen, once the management side is sorted out ...

My conclusion from a quick read of Bert's document was that adding the "Function Space" capability, would not impact on files encoded using UGRID 1.0 - which would be good. However, the proposal does suggest renaming certain special attributes (e.g. face_node_connectivity becomes element_vertex_connectivity). This could be bad for CF backwards compatibility, but, I presume, is easily avoided with a little thought at this stage.

Does all this sound like a reasonable assessment?

Thanks, David

dham commented 4 years ago

I think that this is essentially correct. The naming issues are also not insurmountable. Finite element has its own conventions, but they are just naming conventions. If UGRID has finite difference naming then this will merely be a bit confusing for finite element users.

In the specific cases above, there are two relevant differences to note:

Vertex vs node. Node means something quite different in finite element (a node is a basis function in the dual space to the finite element space), which is why finite element codes usually talk about vertices when discussing the mesh topology.
face vs element or cell. node, edge, face, volume are names for mesh entities of a given dimension. Finite element is often more concerned with codimension, which counts downwards from the mesh dimension. A cell (or element) is an entity of codimension 0, i.e. an entity of maximal dimension. On a 3D mesh the cells are volumes and on a 2D mesh they are faces (cell is also defined for 2D or 1D meshes). A facet is an entity of codimension 1. Facets form the boundaries between cells. On a 3D mesh, the facets are faces while on a 2D mesh they are edges, and on a 1D mesh they are nodes. Given that a UGRID mesh knows its dimension, it is possible for software to identify the cells or facets so the difference in naming convention is not so significant.

davidhassell commented 3 years ago

Hello,

Some colleagues were asking after the status of this proposal. As far as I'm aware, there are no outstanding objections other than the requirement to spell out some rules for the co-management of the two conventions: CF and UGRID.

The CF data model has now been accepted, and the rules for its management will be in CF-1.9. I think there are some similarities between the requirements rules for evolving the data model, and for evolving UGRID.

I am happy to draft some rules for UGRID, if that helps to get the ball rolling.

Thanks, David

JonathanGregory commented 3 years ago

Yes, please. That would be most helpful. Jonathan

davidhassell commented 3 years ago

Here are my proposed additions to https://github.com/cf-convention/cf-convention.github.io/blob/master/rules.md. I probably haven't quite got it yet, but it's a start.

A key thing to note is the second line: "The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it."

What this means is that every enhancement proposal must be "signed off" for any UGRID issues. Almost always this will be a trivial task (e.g. the introduction a new grid mapping attribute parameter - no problem!), but it needs to be done.

For simple cases, as in the example just mentioned, someone who is familiar (rather than expert) with UGRID could take care of this, but for more complicated proposals, the opinion of an expert from the UGRID community must be available.

The first and last sentences covers how to increment the version of UGRID that is acceptable to a give version of CF.

Additional rules relating to the UGRID conventions

All new proposals will be assessed to see if the new features defined in the proposal are compatible with the named version of UGRID that is defined for the current version of the CF conventions.

The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it.

If the proposal is deemed to be not compatible with UGRID in some way, then an attempt must be made to modify the proposal so that its new features are compatible with UGRID, and in such a way that the proposal's intent is not compromised.

If the proposal cannot be acceptably modified to conform to the UGRID conventions, then UGRID will need to be modified to accommodate the new features. If UGRID is extended or generalized in some way that allows the new features but does not affect its existing structure and functionality, the proposal is considered backwards compatible. This is the preferred solution.

Any such changes to UGRID must be defined in general terms, and preferably with a detailed description of the UGRID alterations. However, to facilitate the progress of a proposal that requires UGRID changes, it is sufficient for the general nature of the UGRID modifications to be identified, on the understanding that the UGRID conventions will be updated in detail at a later date, possibly after the proposal has been accepted in all other aspects. Final acceptance will always rely on the completion of changes to the UGRID conventions, which is at the discretion of the UGRID community.

The UGRID conventions exist independently from CF and have their own repository and governance. Therefore the acceptance of a new version of UGRID, whether it arises from a change to CF or from an independent change to UGRID itself, must be raised and discussed in its own enhancement proposal in the usual manner. It follows that a change to CF that requires a change to UGRID will be associated with two GitHub issues - one for the change to CF and one for accepting the new version of UGRID.

erget commented 3 years ago

In general, I'm not a huge fan of tight coupling but I don't have objections to the thrust of these changes - they aren't very onerous and won't force us to increment if we have no reason to.

The specifics make me slightly more nervous. My understanding is that, if proposed changes to CF are not compatible with the currently referenced version of UGRID,

then UGRID will need to be modified to accommodate the new features

via a proposal to UGRID, but

to facilitate the progress of a proposal that requires UGRID changes, it is sufficient for the general nature of the UGRID modifications to be identified, on the understanding that the UGRID conventions will be updated in detail at a later date, possibly after the proposal has been accepted in all other aspects. Final acceptance will always rely on the completion of changes to the UGRID conventions, which is at the discretion of the UGRID community.

Emphasis is my own.

If this is the case, does that mean that we would accept the proposal formally from CF's side when we have verified that a proposal to UGRID has been made, and then simply wait to merge the CF proposal until the UGRID proposal has been accepted? The danger is that this might take a long time, and then the CF baseline might have drifted, so that the original CF proposal would need revisiting. In all cases I think that this would have to be checked again to ensure that we don't introduce inconsistencies into the Conventions.

Alternatively we could merge immediately in the hopes that the UGRID proposal would be accepted - a bit too optimistic for my taste though and the risk would be that the two standards diverge.

davidhassell commented 3 years ago

Hi @erget,

Thanks for highlighting these concerns. I agree with them. I guess I was thinking that the UGRID community is more integrated into the CF community than, say, the CRS-WKT community. In the latter example, we don't claim any influence on the "other" community, don't specify a CRS-WKT version, and have carefully instructed the user of a dataset on what to expect.

I checked back to Trac ticket 171, and there is not really an in depth discussion there, nor is there here other than on these governance rules. These concerns were discussed very carefully for the CRS-WKT case, and should be given more thought here, as well.

The CRS-WKT case is simpler, as it is more self contained. There are, however, many aspects of CF which, if changed, could affect UGRID. Perhaps the recent proposal for a domain variable could be one, for example.

It would be great to hear from some folks who work on UGRID.

I would like to see the two conventions evolve simultaneously, but also do not want to see the delay of new CF features that are needed by user communities that have no interest in UGRID.

It should be noted that if UGRID was formally moved into CF (e.g. as a new chapter 10), then all of this governance stuff goes away. I don't know if this has been discussed as an option elsewhere, but it should be stated in this issue why that's not desirable (if that is indeed the case).

Thanks, David

davidhassell commented 3 years ago

It would be great to hear from some folks who work on UGRID.

I see that the UGRID conventions GitHub repository has not been updated for 2 years, and the version being recommended for CF is UGRID 1.0, which was released 4 1/2 years ago and is also the latest version. At this time CF was at CF-1.6. This is absolutely not a criticism (CF has gone for longer periods in the past with no readily available signs of advancement, although progress was always going on), but we do need to be sure that UGRID 1.0 is compatible with the draft of CF-1.9. Has anyone looked at this?

davidhassell commented 3 years ago

The original proposal suggests

We recommend that netCDF files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.8" which also implies "UGRID-1.0".

This says to me that the proposal is functionally equivalent to UGRID 1.0 being incorporated into CF in a new chapter that describes the UGRID that CF recognizes.

It follows, I think, that CF checkers will be expected to check the UGRID conventions (unlike CRS-WKT, which does not need to be checked). Therefore there is a need for conformance rules and recommendations that cover UGRID in relation to CF. Does UGRID already have it's own conformance rules?

For example, when a data variable has mesh topology, it would need to be stated unambiguously whether or not the coordinates attribute is mandatory, and when it is present whether or not it should always contain the coordinate variables implied by the values of the mesh, location, and location_index_set data variable attributes.

I do very much support UGRID being a part of CF, but think there are some important structural details that need to be worked out first.

At the current time, having thought about all this again, I wonder if the best two options are:

Incorporate UGRID into CF into as a new chapter.
If that is not acceptable, simply drop this issue, and if people want to use UGRID they should say so in the Convenions attribute "CF-1.9 UGRID-1.0". This would be the status quo approach, I think. In this case it would be up to the UGRID developers alone to ensure consistency with CF.

Thanks, David

JonathanGregory commented 3 years ago

Dear @davidhassell

The proposal was made because the UGRID developers at that time decided that they wanted UGRID to become part of CF. Regarding UGRID as part of CF, even though not in the same document, means that new developments in CF are obliged to consider UGRID, as well as the reverse. For both conventions it has both costs and benefits. However it was proposed not to incorporate it as a new chapter in the CF document because that would require reworking and rehosting their document, which would take substantial effort. Could we frame the rules as though UGRID was in effect a chapter of CF? We could even insert a chapter in CF that says "Chapter N, UGRID. See UGRID document at URL".

I agree with you that the conformance document needs rules relating to UGRID. That is necessary to making it part of CF. I also agree that it would be very helpful to hear from the UGRID developers and users about what they now think is the best way to proceed.

Best wishes

Jonathan

hrajagers commented 3 years ago

Dear David, Jonathan,

Thank you for moving this forward. In the very beginning we were thinking to move the definiiton for unstructured grids into the CF conventions, later the discussion moved more in the direction of a separate but affiliated convention, and in the end the discussion seems to have settled on a separately documented but linked convention. Since documentation for the CF conventions has shifted to GitHub as well I don't know whether the effort of porting the documents over to the CF repository is still a major issue, but there is a bit of a difference in documentation style. Since UGRID has received quite some uptake already, it may be advantagious to keep the separate identy whereas being more integrated into CF may increase the uptake in the wider OGC and GIS domains ... although via MDAL support (closely linked to CF) we have already made a good step forward in that direction. By keeping the conventions separate it helps to lower the threshold for implementing either convention. The coin can still flip either way for me, but the concept of modularity introduces a slight preference for keeping the conventions somewhat separate. Many of the core developers of the UGRID conventions are well embedded in the CF conventions, so I think it would be quite acceptable to set out some formal rules as you describe to assure that the two conventions develop in a mutually compatible way.

The UGRID 1.0 convention has indeed remained quite stable, the main reason for this is that we already had quite some discussion and uptake before we released version 1.0 ... and ... conventions for specific topics can remain stable for much longer periods than the CF conventions that cover a wide range of features. I don't know of any recent changes in CF that would make UGRID less compatible with 1.9 than with 1.6. We use it ourselves in combination with 1.8 on a daily basis. The geometries are the only development closely related to UGRID and at some time it used a very similar definition format, but given the considered use cases it made more sense there to drop the concept of shared nodes in favour of simplicity and consistency with other storage formats for GIS feature sets. At Deltares we have a draft extension to allow the 1D discretisations within UGRID to be defined on a curved 1D space ... for this extension we build on those geometries introduced in CF 1.8 to describe the shape of the space (i.e. river) before descretisation. So, we try to adopt new CF features as much as possible.

There is one element in UGRID that is not fully compatible with CF in any version and that's the use of the attribute "cf_role" to identify the "mesh_topology" and various types of connectivity variables such as the "face_node_connectivity". The name of this attribute was chosen when it was anticipated that UGRID was to move into CF, but if remains a separate convention we may have to reconsider this attribute name ... or CF would have to formally permit this type of use.

The question about a UGRID checker has popped up twice or so in discussions I had regarding UGRID over the last year. We haven't implemented a formal UGRID checker, but it would indeed be useful to have.

Best regards,

Bert

davidhassell commented 3 years ago

Dear Bert,

Thank you for describing all of the history that has occurred here - it really is very helpful, particularly the interactions you have had on the geometries front.

A summary of my position would be that I would support UGRID being moved into CF ("chapter 10"), but if this is not possible then I think that we would still be able to find a way to make things work satisfactorily.

Governance

If UGRID were incorporated into the CF text, there is no governance issue. It's an issue that only arises if it lives outside.

If UGRID lived outside, I genuinely do not expect any problems in the CF/UGRID working relationship, for the same reasons described by @hrajagers, but the part of the point of the governance rules is to prevent problems arising, however unlikely. We would need to cover situations thorny situations, such those raised by @erget.

One of my own concerns with UGRID not being inside CF is its general invisibility to users. The proposed introductory text is very short and will surely not result in most users seeking out and reading the full UGRID conventions. For instance, what would happen if a user applied mesh and location attributes to a non-UGRID data variable in good faith, having checked in Appendix A that these attributes are not standardised? The UGRID-aware CF checker might tell them their dataset is broken. OK, they could then rename the attributes with this new-found knowledge, but they need to be able to ascertain this before creating datasets, and I don't think that this is easy enough were UGRID to exist elsewhere.

cf_role

This comes down to variable identification, clearly. From a CF perspective, we know that a data variable employs UGRID because it has location_index_set, or both mesh and location, attributes. One of these attributes identifies the relevant mesh container variable, that in turn identifies other required variables (such as edge node connectivity variables). I don't need cf_role to make any of these connections, so the use of cf_role comprises a redundancy.

So, if we were starting ab initio, I would argue for dropping the cf_role attributes altogether. But we are not! So I can see there is an argument for keeping it for backwards UGRID-compatibility, i.e. extending the use of cf_role to include UGRID (as well as DSGs). Or perhaps it could be dropped without problems?

What do others think?

Conformance

I presume from what you say that there are no conformance rules (rather than there are rules but no existing software)? Is that right. We only need the rules to get UGRID (in any form) into CF. If it were helpful, I would be happy to draft some CF-style conformance rules.

JonathanGregory commented 3 years ago

Dear all

@davidhassell makes a good point about the attributes. If UGRID is to be regarded as part of CF (whether within the document or as linked document with a consistent version) it would make sense for UGRID's attributes to be included in Appendix A, or listed in a separate Appendix (like the grid-mapping ones are) since they aren't general-purpose attributes. As David also said, its requirements should appear as a section of the CF conformance document. Also any important terms which it needs to define could be added to section 1.2, and throughout the CF document any relevant references to UGRID should be inserted. These things would naturally be done if the main UGRID description were included in the CF document, and would help with visibility and consistency to do them in any case. If UGRID isn't completely moved into the CF document as a new chapter, I think it would be worthwhile adding a subsection to describe it. That could be in section 1, like the subsection we have there about COARDS.

I agree with David that if the presence or function of UGRID variables can be identified by the presence of particular attributes, cf_role isn't needed. It's redundant and therefore could cause inconsistency. A possible approach would be to deprecate it, which means the CF checker (when made UGRID-aware) would emit a warning if cf_role was included in these roles. The checker should also give an error if cf_role is present and wrongly used - that would be a consistency rule that would appear in the conformance document.

Best wishes

Jonathan

hrajagers commented 3 years ago

Dear @davidhassell and @JonathanGregory,

I'm not afraid of UGRID getting too little traction as it has already received quite some uptake outside CF, but always formulated as closely linked to CF. However, it would definitely be good to discuss the different use cases for pure CF and CF extended with UGRID somewhere clearly in the CF document.

Governance

I agree on the need for governance documents when UGRID wouldn't be integrated. Regarding the comment made by @erget, I do understand the need for the statement "Final acceptance will always rely on the completion of changes to the UGRID conventions, which is at the discretion of the UGRID community." since one community can never decide what another community must do, but I also agree that your progress shouldn't be limited by another slower community. That's exactly the same reason -- but looking from the other side -- why some may prefer to keep UGRID as a lean and mean separate convention instead of moving it into the bigger CF community. The CF community can always decide at any time to take all the UGRID ideas and merge them into the CF docs (the reverse is extremely unlikely), but my expectation is that as long as the main contributers for the two conventions don't change significantly we'll be able to work out a shared path forward. We can ask developers from the other convention to evaluate proposed modifications if we expect that such changes may cause incompatibilities. If such incompatibilities are detected only in hindsight due to misjudgement then it's in our shared interest and collaboration to resolve them as soon as possible -- most likely by the community that caused the incompatibility in the first place, but the two communities may decide otherwise in collaboration. If desirable changes in one convention would require changes in the other convention, then we can propose such changes to the other community and discuss them together as part of the shared interests. The second community may accept those adjustments, or propose an alternative. If none of the alternatives is acceptable for both communities ... then it would be time to merge the last set of accepted UGRID conventions into CF. Someone else may be able to capture this approach in a more formal way.

cf_role

Regarding the usage of cf_role I agree that the face_node_connectivity role and similar connectivity roles could be dropped since their purpose and meaning should be clear from the corresponding attribute names pointing to them, and most UGRID implementations will probably not check them anyway. However, I'm hesitant about dropping the role mesh_topology since I expect that this will break most if not all current UGRID readers. Your reasoning that one could follow the mesh attributes on the data variables to identify the mesh(es) in the file works in many cases, but wouldn't work if there is only a mesh in the file and no data variable (except for the node, edge and face coordinates) which may or may not refer back to the mesh container variable depending on your philosophy (UGRID doesn't specify whether that is allowed, required or optional). In regular CF the mesh is also only implied by the auxiliary coordinates listed on the data variable (or the dimensions used), so that pathway would indeed be consitent but we do have use cases with only mesh and node coordinates on the file. If we must drop the use of cf_role for CF compatibility, I would be in favour of introducing a new attribute called mesh_type or grid_type to replace it ... the only allowed value would initially be ugrid, but ...

sgrid

... it could be extended with sgrid for staggered structured meshes (see decription of the SGRID conventions). The style of attribute was copied from the geometry_type attribute introduced for the geometry container variable.

conformance

Reading through the UGRID conventions document, I realize that we have sometimes used the word "should" when we actually intended "must", but overall we tried to be very explicit about required and optional attributes so writing down the conformance requirements in a form consistent with these CF pages should be fairly straightforward.

Best regards,

Bert

davidhassell commented 3 years ago

Dear Bert,

Keeping UGRID separate clearly has some advantages, as you describe. In addition (and I think that you implied this), CF could always stick to an older version of UGRID, if newer features are not to its liking. The "nuclear" option, of merging the last set of accepted UGRID conventions into CF if resolution can not be reached, is a good backstop.

Assuming that UGRID were to remain separate, I think that the governance framework that you describe would work well - thanks.

So with my CF hat on I would prefer incorporation, but I'm fine with UGRID being separate if we can work out how to deal with any structural items that have been mentioned, such as the issue of how to let the rest of CF know about attributes reserved for UGRID.

Another area of possible friction is highlighted by the use of datasets with meshes but no data. Right now, storing a mesh without a data variable is not allowed - there's no encoding for it, and (more pertinently) it is not allowed by the CF data model. However, help is at hand with the proposed introduction of a new "domain" variable (#301), that will allow domains (meshes) to exist on their own. Would UGRID want to change to adopt the domain variable approach, or could we allow a special case for UGRID meshes?

Thanks, David

davidhassell commented 3 years ago

Dear Bert, Jonathan, and all,

I would like to try to summarize the ideas that have been discussed in the form of some broad proposals that I hope could be acceptable to allow us conclude this issue. I welcome your feedback.

In no particular order:

(A) The governance is written up along the lines of @hrajagers ideas: https://github.com/cf-convention/cf-conventions/issues/153#issuecomment-703858946

(B) Comprehensive conformance rules are written up for UGRID. These should be maintained alongside UGRID in its repository, and referenced from (not copied into) the CF conformance document.

(C) Update the aforementioned CF Appendix A to include the relevant UGRID attributes, thereby making them visible to all users. Mention in the governance rules that this table needs maintaining.

(D) Based on @hrajagers previous comment, dropping the standardisation of cf_role on the "connectivity" variables, but retaining it on the mesh topology variable. This is related to my previous comments about the use of datasets with meshes but no data, which I now withdraw. A mesh topology variable can actually contain multiple domains in the CF-sense, one of which can "picked out" by a data variable. This makes it sufficiently different, I realise, to the proposed CF domain variable (#301) that we shouldn't to unify them at this time.

(E) Add some text to CF 5.8 (Domain Variables) (currently being proposed in #301) to explain the UGRID mesh topology variable and how it relates to a domain variable. It may the case that the occasional note relating to UGRID would be useful in other sections. I don't propose to review for these, but they could always be added as when it was felt to be useful.

Thanks, David

davidhassell commented 3 years ago

A note on the CF data model:

I think that UGRID need not affect the CF data model at this time.

This is because CF does not currently formalise connections between data variables, on the same or different domains. A mesh topology variable collates multiple domains (one for faces, one for edges, etc.), but a given data variable only refers to one of them (e.g. data:location = "face" ;). How you relate a "face" data variable to an "edge" one is moot when you abstract out the netCDF encoding - you get to the same place if you do it by inspection of the coordinate values, or by inspection of the mesh topology and data variable attributes.

I realise that you could say that the point of UGRID is to make these relations explicit, but if that is to be the case then it should be propagated to other areas of CF (e.g. as SGRID proposes), and so should be considered in the round at a later stage.

Does sound reasonable?

Thanks, David

rabernat commented 3 years ago

Thanks to everyone who is working through these important issues.

I strongly support incorporating SGRID into this same framework.

JonathanGregory commented 3 years ago

Dear @davidhassell

Thanks for this summary. I agree with (A), (B) and (E) as given.

Re (C), I would suggest that only the UGRID attributes which can appear on data variables should be added to CF Appendix A. If I have read the document correctly, these are mesh, location and location_index_set. The other attributes belong to mesh variables. To prevent accidental collision with CF, it would nonetheless be useful to tabulate them, but I'd suggest we put them in a new CF appendix specifically about the UGRID mesh topology variable, consisting of the table with an introductory sentence or two. This would be like the treatment of the attributes of the grid mapping variable, which are in a table in Appendix F, not in Appendix A. I note that the geometry variable attributes appear in Appendix A, but there are only five of them, whereas there are 18 attributes of the mesh topology variable.

Re (D), if Bert @hrajagers and UGRID colleagues are OK with dropping cf_role for connectivity variables, that's good. They could continue to be allowed but deprecated, rather than disallowed. That's a decision to be made when the conformance rules (B) are written. It seems to me that cf_role is also redundant on the mesh topology variable, because it must also have a topology_dimension attribute, it seems. Couldn't that be used as the defining characteristic of a mesh topology variable? If so, this cf_role could also be deprecated; it can't be disallowed if current software depends on it, as Bert says.

(F) I proposed earlier that we could add a short subsection of CF section 1 to introduce UGRID and its purpose, to make clear its special synergy with CF, to remark on the appearance of attributes in CF appendices, and to say that it has its own conformance document which complements the CF conformance document. What do you think?

I agree with you that the relationship between domains of different data variables is not currently considered in the CF data model, but not inconsistent with the data model. If UGRID is not being included in CF, we don't have to consider it at the moment.

Regarding Ryan @rabernat's comment, I think it would be fine to consider SGRID as well, but let's do it as a separate issue, and perhaps after UGRID, because we may not have enough mental capacity to deal with both at once.

Best wishes

Jonathan

davidhassell commented 3 years ago

Dear @JonathanGregory,

Thanks for these comments

I agree with your updated (C).

I'm also fine in (D) with a mesh topology variable getting its canonical identity from the topology_dimension

davidhassell commented 3 years ago

(sorry - pressed send too early - I'll try again!)

Dear @JonathanGregory,

Thanks for these comments

I agree with your updated (C).

I'm also fine in (D) with a mesh topology variable getting its canonical identity from the topology_dimension attribute (which is similar in principle to how a domain variable is to be identified). However, it would be good to here from @hrajagers if making this cf_role use optional is OK.

I agree with (F).

If UGRID is not being included in CF, we don't have to consider it at the moment.

I'm not sure what you mean here. Is it "Even though UGRID is being included in CF, we don't have to consider the relationship between domains of different data variables at the moment.". Apologies if I have misunderstood.

Also on the data model, I'd just like to highlight that the integer-valued interconnectivity variables (e.g. "edge_node_connectivity") need not feature in the data model, for the same reasons that a list variable for compression by gathering does appear in the data model.

I agree that the issue of SGRID, and interconnected domain/data variables in general, is a conversation for another issue; and now that we are starting to formalise the storage of domains as independent entities, this is a good moment to reignite this conversation.

All the best, David

JonathanGregory commented 3 years ago

Dear @davidhassell

Is it "Even though UGRID is being included in CF, we don't have to consider the relationship between domains of different data variables at the moment."

Yes, that is what I meant. By "not being included in CF" I meant "in the CF document". Sorry to be unclear.

Thanks

Jonathan

davidhassell commented 3 years ago

Dear @JonathanGregory and @hrajagers,

Here is a synthesis current proposal. How does it sound?

(A) The governance is written up along the lines of @hrajagers ideas: #153 (comment)

(B) Comprehensive conformance rules are written up for UGRID. These should be maintained alongside UGRID in its repository, and referenced from (not copied into) the CF conformance document.

(C) Document the standardised UGRID attributes in the CF conventions, thereby making them visible to all users. Mention in the governance rules that they need maintaining. Only the UGRID attributes which can appear on data variables (mesh, location and location_index_sets should be added to CF Appendix A. All other attributes UGRID attributes (such as those on mesh variables) should go into a new CF appendix specifically about the UGRID mesh topology variable, consisting of the table with an introductory sentence or two. This would be like the treatment of the attributes of the grid mapping variable, which are in a table in Appendix F, not in Appendix A. Note that the geometry variable attributes appear in Appendix A, but there are only five of them, whereas there are 18 attributes of the mesh topology variable.

(D) Deprecate the cf_role attribute for connectivity variables (but retaining it on the mesh topology variable). If and only if @hrajagers and UGRID colleagues are in agreement, also deprecate the cf_role attribute mesh variables, deferring to the presence of the topology_dimension attribute for identification. This latter suggestion will make the mesh variable "more CF-like", but that is not worth at this time if the expense is a lot of broken existing software.

(E) Add some text to CF 5.8 (Domain Variables) (now accepted for CF-1.9) to explain the UGRID mesh topology variable and how it relates to a domain variable. It may the case that the occasional note relating to UGRID would be useful in other sections. I don't propose to review for these, but they could always be added as when it was felt to be useful.

(F) Add a short subsection of CF section 1 to introduce UGRID and its purpose, to make clear its special synergy with CF, to remark on the appearance of attributes in CF appendices, and to say that it has its own conformance document which complements the CF conformance document.

Thanks, David

JonathanGregory commented 3 years ago

Thank you, @davidhassell. I think this is fine.

hrajagers commented 3 years ago

Thank you, @davidhassell . Looking good to me as well. As shown above on the discussion page, I have posted in the UGRID forum (as well as the SGRID forum following this comment) a request for feedback on this proposal. I'll also reach out to a number of people offline.

davidhassell commented 3 years ago

Hello,

Following on from some discussions that have been taking place on the UGRID issue tracker, a couple more items to the proposal are needed (G and H). Here is the new set:

(A) The governance is written up along the lines of @hrajagers ideas: https://github.com/cf-convention/cf-conventions/issues/153#issuecomment-703858946

(B) Comprehensive conformance rules are written up for UGRID. These should be maintained alongside UGRID in its repository, and referenced from (not copied into) the CF conformance document.

(C) Document the standardised UGRID attributes in the CF conventions, thereby making them visible to all users. Mention in the governance rules that they need maintaining. Only the UGRID attributes which can appear on data variables (mesh, location and location_index_sets should be added to CF Appendix A. All other attributes UGRID attributes (such as those on mesh variables) should go into a new CF appendix specifically about the UGRID mesh topology variable, consisting of the table with an introductory sentence or two. This would be like the treatment of the attributes of the grid mapping variable, which are in a table in Appendix F, not in Appendix A. Note that the geometry variable attributes appear in Appendix A, but there are only five of them, whereas there are 18 attributes of the mesh topology variable.

(D) Deprecate the cf_role attribute for connectivity variables (but retaining it on the mesh topology variable). If and only if @hrajagers and UGRID colleagues are in agreement, also deprecate the cf_role attribute mesh variables, deferring to the presence of the topology_dimension attribute for identification. This latter suggestion will make the mesh variable "more CF-like", but that is not worth at this time if the expense is a lot of broken existing software.

(E) Add some text to CF 5.8 (Domain Variables) (now accepted for CF-1.9) to explain the UGRID mesh topology variable and how it relates to a domain variable. It may the case that the occasional note relating to UGRID would be useful in other sections. I don't propose to review for these, but they could always be added as when it was felt to be useful.

(F) Add a short subsection of CF section 1 to introduce UGRID and its purpose, to make clear its special synergy with CF, to remark on the appearance of attributes in CF appendices, and to say that it has its own conformance document which complements the CF conformance document.

(G) Further to the discussion on the order in which the corner nodes of a volume are specified (ugrid-conventions/ugrid-conventions/issues/53), the UGRID conventions drops support for the fully 3D meshes (When someone actually needs it, it can reintroduced without the current ambiguities, which will be fine for CF.)

(H) Further to the discussion on implications on the CF data model (ugrid-conventions/ugrid-conventions/issues/52), the CF data model needs to be updated to allow the storage of topological connections between cells ("cells" in the CF data model sense). It is not necessary at this stage for the connectivity between cell elements (such as the edge of a face) to be a part of the CF data model.

Thanks, David

JonathanGregory commented 3 years ago

This is fine, thanks. If we agree, I suppose that we need to write propose some textual changes in the CF documents for some of these points.

davidhassell commented 3 years ago

Hello,

In starting to think about:

(H) Further to the discussion on implications on the CF data model (ugrid-conventions/ugrid-conventions/issues/52), the CF data model needs to be updated to allow the storage of topological connections between cells ("cells" in the CF data model sense). It is not necessary at this stage for the connectivity between cell elements (such as the edge of a face) to be a part of the CF data model.

For the one case that I thought was not possible in the current CF data model, i.e. a 1D network topology,

I now realise the current data model is OK. I was thinking that in this case the CF domain cells were the nodes, but really a cell is an edge bounded by two nodes. In this view, the existing CF framework suffices. Remember that the CF data model is independent of the netCDF encoding, including the UGRID. It doesn't matter if the CF data model view of the world looks quite different to that of the UGRID view, as long as it is possible to unambiguously map between the two.

Thanks, David

JonathanGregory commented 3 years ago

I agree that a cell which is an edge bounded by two nodes is fine in the CF data model, yes. (H) is correct that CF doesn't explicitly recognise connectivity, although you could infer it from coincidence - isn't that right? Jonathan

davidhassell commented 3 years ago

Dear Jonathan,

That's right. I should have made that clearer, so many thanks for pointing it out!

CF provides cell connectivity by inspection of coincident (or possibly overlapping) bounds.

UGRID provides an index based encoding for making the connectivity easier to find in many circumstances, but the CF data model does not need to copy this. Not including these connectivity indices in the CF data model in no way prevents a software application from managing, and utilising, them "alongside" the data model representation.

zklaus commented 3 years ago

Note that not having explicit connectivity creates a (potentially) rather large computational overhead for the reconstruction of it. This will be exacerbated with more complicated grids in the future, think time-dependent unstructured grids, perhaps with regionally varying timestep.

I think it would be good to provide a standardized way of storing connectivity, though I agree it also makes sense to make it optional, at least for most commonly used classes of grids today.

davidhassell commented 3 years ago

Dear Klaus,

I hope to allay your concerns by noting that all of the UGRID machinery for storing connectivity will certainly be imported into CF unchanged - it is just that in the logical data model we don't need to make special mention of it. This is because it turns out the connectivity variables are just one (very convenient) way of encoding something in CF that can in principle already be done another way. We call the CF data model a "minimal" logical data model, meaning that we keep it simple by not overlapping the contents of one part of the model with another.

All the best, David

pp-mo commented 3 years ago

Hi, sorry to be late to the table with this issue, but I have been listening in here a while, in hopes of understanding better and maybe contributing.

I think that I may have spotted a potential problem with the removal of 'cf_role' : As stated (above), the role of a mesh-variable is identifiable by its having a 'topology_dimension' attribute, but the same is not true of a location-index-set :
So, if we include an index set in a "mesh only file" (aka "meshfile" or "gridfile" in some quarters), then we will be unable to distinguish it from a data-variable. This in turn implies that, when loading a file, we would need to "know" whether it was intended to be a "mesh file" or a "normal datafile" as you can't determine that by inspection.

( This problem is entirely analagous to an existing problem with auxiliary-coordinates in standard CF : If the data-variable which references them is removed, then they are not distinguishable from data-variables -- so they "become" data-variables )

Our context : Here at the UK MetOffice, we are working to support unstructured data within Iris, using UGRID as the template for our internal data-model (as we already do for CF). Locally, we have a particular interest in the use of location-index-sets, and we are also intending to use "mesh files" i.e. files with only the mesh structure and no data.

Also, our practical experience in tools development and support shows that files which do not fully comply with conventions are, sadly, just not that uncommon even in the respected international archives. Thus, just as Iris has to somehow handle files with invalid standard-names and units, or mis-specified grid-mappings, so our trial UGRID files suffer from problems like missing optional connectivity links, the odd miss-spelling and so on. Such as this. What this tells us is, that the "robustness" of the format is also a important consideration.

From the point of view of a generic code library developer, the unambiguous identification of the 'role' of elements within a file will definitely make writing parsing code more straightforward -- and not least because dealing with incorrect input in a helpful way is an important usability factor (just as it is in compiler design). So, I must confess that I personally was preferring the way that UGRID labels each component unambiguously, instead of relying on links from other components to infer the role of a variable.

Solutioneering maybe, but ... could we instead _allow the attribute to be named 'ugrid_role', and simultaneously deprecate the older 'cfrole' usage ?

davidhassell commented 3 years ago

Dear Patrick,

Thank you for bringing up "location index set" variables. I agree that in the absence of the cf_role attribute it is not always possible to distinguish one from data variable, so I would be happy with retaining it on these variables.

By extension, I think that we should drop the suggestion (it is just that at this stage) of removing cf_role from the mesh topology variable. This is because a location index set variable is logically identical to a mash variable, so having a common mechanism of identification would be nice.

( This problem is entirely analogous to an existing problem with auxiliary-coordinates in standard CF : If the data-variable which references them is removed, then they are not distinguishable from data-variables -- so they "become" data-variables )

I wouldn't call this a problem, rather a feature! When we read a dataset, we tend to not cast variables that have been identified as auxiliary coordinates (or other roles) as data variables as well, but that is only a default behaviour that is what most of want most of the time.

All the best, David

davidhassell commented 3 years ago

Hello @ChrisBarker-NOAA and all,

After some very illuminating discussion over at https://github.com/ugrid-conventions/ugrid-conventions/issues/52, and some offline discussions, @JonathanGregory and I have come up with some suggested text to the data model in Appendix I that we hope will update the CF data model for the mythical connectivity.

The new text is hopefully self-explanatory and edits the "Domain construct" section Appendix I (edits in italics) and provides a new section describing a new construct type - the "network topology construct". Here we are using "network topology" in this sense, but there may well be better names.

Remember that the CF data model is a logical data model that is independent of the encoding, so it not a problem that the connectivity array does not look like a UGRID netCDF connectivity variable.

(I took the liberty of rearranging some the network topology text after Jonathan last saw this - I take full responsibility for any negative consequences.)

Domain construct

The domain construct (figure 3) describes a domain comprising measurement locations and cell properties. The domain construct is the only metadata construct that may also exist independently of a field construct. The domain construct contains properties to describe the domain (in the same sense as for the field construct) and relates the following metadata constructs

Domain axis constructs.
Dimension coordinate and auxiliary coordinate constructs.
Coordinate reference constructs.
Domain ancillary constructs.
Cell measure constructs.
Network topology constructs.

All of the constructs contained by the domain construct are optional (as indicated by "0.." in figure 3).

In CF-netCDF, domain information is stored either implicitly via data variable attributes (such as coordinates), or explicitly in a domain variable, or a UGRID mesh topology variable. In the latter two cases, the domain exists without reference to a data array.

Network topology construct

A network topology describes the connectivity of domain cells indexed by a subset of the domain axis constructs. When two cells are connected, operations on the data stored on them may be assumed to be continuous across their common boundary. A network topology construct describes logically and explicitly the network topology of cells indexed by a single domain axis construct. A network topology construct contains a connectivity array that spans a unique domain axis construct with the addition of an extra dimension of the same size, such that each dimension indexes the cells. The array is symmetrical, and each element indicates whether the pair cells to which its indices refer are connected.

A network topology that has no corresponding network topology construct, which includes those for multiple domain axis constructs, is nonetheless implicitly defined by a function of both the physical contiguousness of the cells, and the nature of the real world or simulated processes that produced the data. For example, in a field which contains both land and ocean cells, connections between land and ocean cells might be excluded for some physical processes.The description of such an implicit network topology may require metadata that is external to CF.

In CF-netCDF a network topology can only be provided for a domain defined by a UGRID mesh topology variable. In this case, the connectivity array is supplied by a UGRID connectivity variable (such as a "face_face_connectivity" variable).

ChrisBarker-NOAA commented 3 years ago

@davidhassell:

Thanks! this looks good to me.

Am am curious why you chose "network topology" rather than "mesh topology" -- I"m not sure it matters much but at least that wikipedia article is pretty focused on the topology of communication networks which is pretty similar, but not quite.

On the other hand, "mesh topology" is used for computer networks (https://www.computerhope.com/jargon/m/mesh.htm)

I have no idea if there's a standard term for what we mean in this case, but modeling folks do use teh term "mesh" or "grid" in this context, and I've never heard anyone use the term "network".

Though the GIS folks use "Triangulated Irregular Network" (https://en.wikipedia.org/wiki/Triangulated_irregular_network) -- maybe that's where "network" came from?

The computer graphics folks use: "Triangle mesh" (https://en.wikipedia.org/wiki/Triangle_mesh)

So there's president for that, too.

davidhassell commented 3 years ago

Hi @ChrisBarker-NOAA,

Great - real progress!

I'll start work on the other aspects of the data model that need to be updated (like the table of construct types, the figures, and a high-level explanation of how UGRID fits in).

Thanks for some context on the choice of name. Whilst I can't remember where the network topology idea came from (presumably from me and/or Jonathan), I think that we wanted to avoid using the term "mesh topology" to differentiate our new construct from a UGRID mesh topology variable. The new network topology data model construct contains only the connectivity, but the mesh topology variable also contains the actual node coordinates, and other items, and map to one or more CF domain constructs.

Assuming that that's a valid argument, any sensible name other than "mesh topology construct" is fine by me.

On the other hand if my reasoning turns out to be specious, I'd be happy with "mesh topology construct"!

"grid topology" is tempting, but right from the start of the data model development many years ago we steered clear of the word "grid" due to its connotations with regularity and structuredness, and neither of those are in play here.

The choice of name is most certainly important, but we don't necessarily have to decide straight away - nothing will be held up if we want to change the name at a later stage.

Thanks, David

JonathanGregory commented 3 years ago

Dear @ChrisBarker-NOAA and @davidhassell

Would "domain topology" make sense?

I suggest that the penultimate paragraph should begin, "A domain that has no ?? topology construct, including domains with more than one domain axis construct"

Best wishes

Jonathan

davidhassell commented 3 years ago

Dear @JonathanGregory

I like "domain topology". (I'll carry on using ?? below so as not to normailise yet)

I suggest that the penultimate paragraph should begin, "A domain that has no ?? topology construct, including domains with more than one domain axis construct"

I don't think that's right. A domain has as many ?? topologies as there are domain axis subset combinations. For example, a domain with three domain axes has six ?? topologies. The three that apply to two or more domain axes are always implicit, and each of the three that apply to only one domain axis is either implicit or else is explicity provided with a ?? topology construct.

A common case will be a domain with two domain axes - one for time and one for UGRID cells - but there can still be a ?? topology for the UGRID axis.

JonathanGregory commented 3 years ago

Dear David

It's a good point that there may be domain axes which aren't involved in the topology. Thanks for the correction. We can supply a ?? topology construct provided it refers to a single domain axis (the UGRID case). I'm not sure there would be more than topology for a domain, although I agree it's conceivable, but I think that's in the realm of speculation, since we don't currently have a use-case for describing the situation with more than one domain axis involved in the topology.

Best wishes

Jonathan

davidhassell commented 3 years ago

Dear Jonathan,

Thanks. It sounds like we agree that the domain could have multiple topologies (as opposed to topology constructs). It's a good point that we have no use case for two or more topology constructs, each of which applies to a single unique domain axis, and in fact we have no way of encoding it, so that case should indeed be excluded.

I think that it is important to keep the general definition of a domain topology, that could span multiple dimensions, and to note that a domain could have more than one domain topology, even though we only allow one 1-D domain topology construct to be provided. In this way the land-sea example makes sense.

Here's a new attempt. New text in italics, and I'm trying out "domain topology " for size:

Domain topology construct

A domain topology describes the connectivity of domain cells indexed by a subset of the domain axis constructs. When two cells are connected, operations on the data stored on them may be assumed to be continuous across their common boundary. A domain topology construct describes logically and explicitly the domain topology of cells indexed by a single domain axis construct. A domain topology construct contains a connectivity array that spans a single domain axis construct with the addition of an extra dimension of the same size, such that each dimension indexes the cells. The array is symmetrical, and each element indicates whether the pair of cells to which its indices refer are connected. A domain construct may contain at most one domain topology construct.

For any subset of the domain axis constructs, excluding a domain axis construct for which there is a domain topology construct, there is an implicit domain topology that is defined by a function of the physical contiguousness of the cells, and/or the nature of the real world or simulated processes that produced the data. For example, in a field which contains both land and ocean cells, connections between land and ocean cells might be excluded for some physical processes.The description of such an implicit network topology may require metadata that is external to CF.

In CF-netCDF a network topology can only be provided for a domain defined by a UGRID mesh topology variable. In this case, the connectivity array is supplied by a UGRID connectivity variable (such as a "face_face_connectivity" variable).

JonathanGregory commented 3 years ago

Dear @davidhassell

Thanks. That looks fine to me. I wonder what @ChrisBarker-NOAA thinks.

Best wishes

Jonathan

ChrisBarker-NOAA commented 3 years ago

Honestly, I'm a bit lost :-( so I"ll trust this is clear to sometone that has taken the time to study teh data model more than I have.

But to the extent I understand it, it looks good to me :-)

But two questions: "In CF-netCDF a network topology can only be provided for a domain defined by a UGRID mesh topology variable"

Didn't you change the terminology so that that would be:

"In CF-netCDF a domain topology can only be provided for a domain defined by a UGRID mesh topology variable"

and: "The array is symmetrical,..."

I'm not sure I follow -- in what sense is it symmetrical? Does that mean that if cell A connects to cell B then cell B connects to cell A? Which is the case, but I don't know that that means the array that holds the is symmetrical. Perhaps I'm getting it confused with a symmetric matrix, which is something else altogether.

pp-mo commented 3 years ago

in what sense is it symmetrical?

Just trying to get my head around this, it seems to me that what is meant is : we provide a connectivity array, which is a square matrix describing the connectivity relation "from cells to cells", i.e. connectitivy[i,j] == cell[i] adjoins cell[j] where i and j both range over the same cell dimension. In which case, it is just a symmetric matrix, meaning exactly that .. 'adjoins' is a symmetiric relation ; or cell[i] adoins cell[j] <--> cell[j] adoins cell[i] (thus, the array == it's own transpose, which is a definition of symmetry).

I think there are also a few technical aspects that we might want consider pinning down a bit further...

presumably, connectivity values are always boolean (or equivalent).
Perhaps we should state that.
we might also state whether cells are considered to connect "to themselves", as appearing in values of "connectivity[i,i]". at the least, I would say we don't want that aspect to be variable, I.E. maybe we state that all "connectivity[i,i]" are always 1/true ; or false ; or state that they have 'no meaning regardless of value'.

@ChrisBarker-NOAA @JonathanGregory @davidhassell does this accord with how you have been viewing this (from either the CF or UGRID "sides") ?

pp-mo commented 3 years ago

It sounds like we agree that the domain could have multiple topologies (as opposed to topology constructs). It's a good point that we have no use case for two or more topology constructs, each of which applies to a single unique domain axis, and in fact we have no way of encoding it, so that case should indeed be excluded.

As with @ChrisBarker-NOAA , I'm not truly confident that I know what this really means + intends.

However, another thing that has been bothering me in this, is a possible lack of "symmetry" in the relationship between the CF and UGRID views of these definitions.
In UGRID, what we might call the "reflexive" connectivities, i.e. face-face / edge-edge / node-node, are only a part of the possible information -- but with the current statement, it seems that these are the only part of it that the CF datamodel will concern itself with.
So, as this only constrains (or describes) properties of a part of the UGRID information, I'm a bit worried if this could be an omission -- which in turn could require further amendments at some future time ?

If I've understood the above, then it means we are also proposing a limited type of "topology constructs" : that only relates one of (possibly multiple) toplogies to itself.
In which case... what is being "excluded" is (a) any more than one topology-construct, defined on a given domain, and also (b) any (hypothetical) topology-construct that would carry a relationship between two different topologies of a domain.

But.. what I may particularly be misunderstanding, in this, is what multiple "topologies of a given domain" would actually represent (i.e. would these be equivalent to the different 'locations' of a mesh, or not ?).

cf-convention / cf-conventions