cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
85 stars 43 forks source link

Allow CRS WKT to represent the CRS without requiring reader to compare with grid mapping parameters #222

Closed snowman2 closed 4 years ago

snowman2 commented 4 years ago

Title: Allow CRS WKT to represent the CRS without requiring reader to compare with grid mapping parameters Moderator: ???
Moderator Status Review [last updated: YY/MM/DD]: ??? Requirement Summary:

I propose the requirement be changed like so:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. , then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

Benefits:

  1. The CRS could originate from several different formats such as WKT, PROJ, or SRS Authority Code. If there are errors in the conversion process to the CF or WKT representation, only the provider would have the original CRS representation. As such, if there are conflicts, the provider would be the best source to go to in order to resolve the conflicts.
  2. Making this change will simplify the lives of software developers so they can just read in the WKT or grid mapping CF parameters for the CRS without a need to compare the two.

Status Quo: http://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format mentions

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.
snowman2 commented 4 years ago

Does proj4 include rotated pole and vertical perspective, for instance?

Just to have it noted that it is supported in PROJ (the conversion classes will be released in pyproj 2.5.0 soon, but the PROJ string version works on all versions):

>>> from pyproj.crs.coordinate_operation import RotatedLatitudeLongitudeConversion, VerticalPerspectiveConversion
>>> RotatedLatitudeLongitudeConversion(o_lat_p=1, o_lon_p=2).to_proj4()
'+proj=ob_tran +o_proj=longlat +o_lat_p=1 +o_lon_p=2 +lon_0=0'
>>> VerticalPerspectiveConversion(viewpoint_height=50000).to_proj4()
'+proj=nsper +lat_0=0 +lon_0=0 +h=50000 +x_0=0 +y_0=0'

From tests:

    crs = CRS.from_cf(
        dict(
            grid_mapping_name="rotated_latitude_longitude",
            grid_north_pole_latitude=32.5,
            grid_north_pole_longitude=170.0,
        )
    )
    expected_cf = {
        "semi_major_axis": 6378137.0,
        "semi_minor_axis": crs.ellipsoid.semi_minor_metre,
        "inverse_flattening": crs.ellipsoid.inverse_flattening,
        "reference_ellipsoid_name": "WGS 84",
        "longitude_of_prime_meridian": 0.0,
        "prime_meridian_name": "Greenwich",
        "horizontal_datum_name": "World Geodetic System 1984",
        "grid_mapping_name": "rotated_latitude_longitude",
        "grid_north_pole_latitude": 32.5,
        "grid_north_pole_longitude": 170.0,
        "north_pole_grid_longitude": 0.0,
    }
    cf_dict = crs.to_cf()
    assert cf_dict.pop("crs_wkt").startswith("PROJCRS[")
    assert cf_dict == expected_cf
    crs = ProjectedCRS(conversion=VerticalPerspectiveConversion(50, 0, 1, 0, 2, 3))
    expected_cf = {
        "semi_major_axis": 6378137.0,
        "semi_minor_axis": crs.ellipsoid.semi_minor_metre,
        "inverse_flattening": crs.ellipsoid.inverse_flattening,
        "reference_ellipsoid_name": "WGS 84",
        "longitude_of_prime_meridian": 0.0,
        "prime_meridian_name": "Greenwich",
        "horizontal_datum_name": "World Geodetic System 1984",
        "grid_mapping_name": "vertical_perspective",
        "perspective_point_height": 50.0,
        "latitude_of_projection_origin": 0.0,
        "longitude_of_projection_origin": 1.0,
        "false_easting": 2.0,
        "false_northing": 3.0,
    }
    cf_dict = crs.to_cf()
    assert cf_dict.pop("crs_wkt").startswith("PROJCRS[")
    assert cf_dict == expected_cf

If it's not our own resource, we can't necessarily add to it

It is an open-source tool that accepts contributions. Are you not allowed to contribute to other open-source projects where you work?

taylor13 commented 4 years ago

I haven't been following this, but what about tripolar grids, which are often used in ocean models?

zklaus commented 4 years ago

How are they represented in CF right now? As far as I know only by 2d coordinates (which doesn't codify the iso-coordinate lines).

JonathanGregory commented 4 years ago

Dear @snowman2 That's good about proj supporting those two projections. Thanks. I used proj as an example but I was really thinking more about the general point that Jim raised, about relying on external resources. We've often discussed this on the former CF email list. I understand from what you say that proj is software, rather than a conventions document, so that's a bit different. You're right of course that open-source software can be modified by anyone for their own use. However, if someone proposes a new grid_mapping in CF, they just want the document to be modified, and they would not be offering to modify proj software to support this new mapping. Best wishes Jonathan

JimBiardCics commented 4 years ago

@JonathanGregory Am I right in understanding that back in the day the whole grid_mapping scheme in CF was added to annotate the relationship between lat and lon variables (which were always required) and X and Y variables? If the purpose was annotation and that annotation was itself optional, there was no actual need for the grid_mapping attributes provide sufficient information to do the transformation.

JonathanGregory commented 4 years ago

@JimBiardCics The grid_mapping attributes were introduced to allow software to do the transformation if it wanted to, for instance to transform other locations (apart from the coordinates and boundaries) or do other computations (such as working out local directions or cell areas). They are optional because generic software does not need to be able to do the transformation if the 2D lat and lon coordinates are required to be present.

pp-mo commented 4 years ago

@taylor13 I haven't been following this, but what about tripolar grids, which are often used in ocean models? @zklaus How are they represented in CF right now? As far as I know only by 2d coordinates (which doesn't codify the iso-coordinate lines).

From our specific experience : Most ocean models use the so-called ORCA grids. These are not "just" a tripolar coordinate grid but use irregular sampling in various places.

Most importantly : a grid is not a projection

So, a grid like ORCA has 2D latitude and longitude values for cell corners (and centres). So that might appear to be a coordinate-like space described by the grid indices. But this is misleading, because a projection would be a smooth, reversible, total function : Instead ...

  1. you cannot define a point on the globe for an "interpolated" point in index-space like "ix=100.5, iy=101.3". The relationship is entirely defined by the (2d) X and Y values, there is no smooth function from (ix, iy) to (lat, lon).
  2. there is no reverse mapping : you can't ask "where in the grid" a given point on the globe is. Only which cells it may fall within...
  3. in fact not all parts of the globe are covered at all. The bottom of the space can be distorted to give extra detail in the Southern ocean, and does not cover the South pole. Also, some model cells actually overlap along the top seam, referred to as a 'north fold'. Thus...
  4. ... returning to the 'reversal' question, a given global location can be in several gridcells, or none.

In practice ORCA grids are a particularly extreme case, in that the standard grids are defined entirely by the provided 2D arrays of latitudes + longitudes, and there is no provided equation defining those numbers, as far as I can determine.

davidhassell commented 4 years ago

Hello,

The timings and order of the breakout groups for the CF meeting next week has now been set (see http://cfconventions.org/Meetings/2020-Workshop.html), and the discussion of this issue will be on Thursday 11 June from 16:00-17:30 UTC, in parallel with three other topics.

Thanks.

JonathanGregory commented 4 years ago

We're going to discuss this issue today. I've just written the following in the google doc about it, because I can't say all this in the discussion, and I would like to explain where I'm coming from when we talk.

The current position is that the grid_mapping takes precedence over WKT if both are present. To know whether this the case, the data-user would have to interpret both and compare them, and take the grid_mapping as correct if the two are inconsistent. If I understand correctly, the idea of the proposal is to remove the requirement of precedence, so that the data-user can read the WKT alone and not consider the grid_mapping. In fact, I think the data-user can behave in just that way as things stand. The data-user is not obliged to check that the dataset is self-consistent. They are entitled to assume that it is, because the onus is on the data-producer to ensure this. I believe that's what we had in mind when we decided the current convention, which was a sort of compromise. The statement of precedence is intended to resolve problems if they are detected. Hence I don't think we need to change the convention to achieve the aim stated (Allow CRS WKT to represent the CRS without requiring comparison with grid mapping Parameters), if I've understood it correctly. It's already OK.

The discussion in this issue has gone further than that, however. A lot of it is about whether the grid_mapping should be able to represent the information in the WKT. I think this is a much more difficult question. As I've commented in the issue, I have serious concerns about this. I realise that other people have good reasons for suggesting it, and I hope we can find a consensus, as usual - honestly, I do! I have two kinds of concern, which are related.

Because of these concerns, I continue to think that we should require data-procedures to describe everything they want to in grid_mapping, unless we can write down explicitly the mappings between CF metadata and WKT. We don't have to be able to carry out the conversion in software, but we should write down how to do it. When that has been done, we would have much greater clarity. We could be confident in stating that certain part of WKT were equivalent to CF metadata. It's possible that there are vocabularies in WKT which could be adopted by CF, and thus not maintained by CF. That idea is often suggested (e.g. by Jim in this issue) and it makes sense provided the vocabulary is one which can exactly “slot” into CF metadata. That is, it must suit our data model.

It seems to me that this approach of mapping is what is being followed with the discussions about standard names and ontologies. It is not being suggested that standard names should be replaced by vocabularies from elsewhere, for the same kind of reason that there isn't a one-to-one correspondence. However, the mappings can be established to help with interoperability.

But having written down the mapping between CF metadata and WKT, we would be able to write software that can do the conversion completely (@snowman2 already has such software, and probably others do). That could be used in the cf-checker to check that the grid_mapping and WKT are consistent, which is the problem we started with in this issue.

snowman2 commented 4 years ago

The statement of precedence is intended to resolve problems if they are detected. Hence I don't think we need to change the convention to achieve the aim stated (Allow CRS WKT to represent the CRS without requiring comparison with grid mapping Parameters), if I've understood it correctly. It's already OK.

@JonathanGregory - does this mean you approve of the proposed wording change?

If everyone approves of the wording change, then we could re-purpose the meeting to another CRS WKT topic, such as some issues you have mentioned in your post.

JonathanGregory commented 4 years ago

If I've understood your concern correctly, I don't see the problem with the existing wording. It's intended to give guidance for what to do if a data-user notices an inconsistency. Is the problem that it can be understood to imply that the data-reader must read both versions and compare them? If so, maybe we could address that concern by keeping the present wording, but adding some more words to say that the data-user can assume they are consistent, and therefore needs to read only one of them. Jonathan

snowman2 commented 4 years ago

Is the problem that it can be understood to imply that the data-reader must read both versions and compare them?

Yes, there are several data readers that have understood it this way. So, clarification on this would be a welcome addition.

taylor13 commented 4 years ago

Yes, to me that seems like a good solution. If a user wants to assume that when there are conflicts, the WKT values take precedence, then that's o.k. But, if the user wants to check the values for consistency, then when they conflict, the user should assume the CF value is correct. This gives the data writer a little extra incentive to try to get things right when converting the WKT metadata to CF.

snowman2 commented 4 years ago

Sounds like we should target the discussion towards the wording of the clarification to ensure all parties are on-board.

snowman2 commented 4 years ago

But, if the user wants to check the values for consistency, then when they conflict, the user should assume the CF value is correct. This gives the data writer a little extra incentive to try to get things right when converting the WKT metadata to CF.

I think that if the values are not consistent, it does not make sense to default to CF or WKT as either one or both could be incorrect depending on your starting format of the CRS and how the conversion was done.

In the proposal, it adds this clarification:

If conflicts exist between the representations, you should inform the provider so they can be addressed.

Reasoning for this wording mentioned in the proposal:

The CRS could originate from several different formats such as WKT, PROJ, or SRS Authority Code. If there are errors in the conversion process to the CF or WKT representation, only the provider would have the original CRS representation. As such, if there are conflicts, the provider would be the best source to go to in order to resolve the conflicts.

taylor13 commented 4 years ago

I agree that informing the data provider of conflicts when found is good practice. But what behavior should software have? Should it just stop, or can we tell it which of the two conflicting pieces of information it should rely on (until the conflict has been resolved)?

My general ignorance about WKT prevents me from understanding what is meant by "then the value specified by the single-property attribute shall take precedence." Is the single-property attribute sometimes the WKT property and sometimes the CF attribute, or is it invariably the CF attribute?

JonathanGregory commented 4 years ago

I agree that the data-producer should be the best authority on what was intended. However, knowing that doesn't give the data-user an immediate solution to an inconsistency. I think the current wording (the precedence of CF metadata over WKT) makes sense, since this is a CF dataset. As Karl says, that default also gives an incentive to the data-producer to ensure consistency. However, I think it's also fine to recommend contacting the data-producer.

snowman2 commented 4 years ago

The final wording from the breakout meeting:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. , then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

taylor13 commented 4 years ago

Which parts of the sentence:

"If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible."

should trigger an error (warning?) in a compliance checker? Is the file compliant if the two are not the same? Is a file compliant if the grid mapping parameters are incomplete (when it is possible for them to be complete)?

snowman2 commented 4 years ago

should trigger an error (warning?) in a compliance checker?

I would say an error.

Is the file compliant if the two are not the same?

I would say no based on this part: "in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately"

Is a file compliant if the grid mapping parameters are incomplete (when it is possible for them to be complete)?

I would say no based on this part: "the attributes must be the same and grid mapping parameters should always be completed as fully as possible"

cameronsmith1 commented 4 years ago

How about an entirely different solution: When there are multiple grid descriptions in a file, the creator must add a metadata flag that indicates which of the grid descriptions is 'primary'.

The onus to make grid descriptions as equivalent as possible can still be on the creator, but the user will know which one to trust if there is a discrepancy. And the CF checkers will only need to check that there is one, and only one, 'primary' flag.

snowman2 commented 4 years ago

That would require all applications to be able to support the WKT format. Currently that is not possible due to software limitations (see comments in this thread).

JonathanGregory commented 4 years ago

Dear all

Before the meeting yesterday I was arguing, like Karl @taylor13, to retain the presumption that if the grid_mapping and crs_wkt are inconsistent, the grid_mapping is correct. I was persuaded by the discussion that this isn't generally helpful, because the data-user might well think it was unsafe to proceed on that basis. In particular, the data-user might be aware that the WKT information had come first, and therefore suspect the grid_mapping of being an incorrect translation. Karl and I had argued also that the presumption of grid_mapping being correct gives an extra incentive to the data-producer to make sure the two are consistent. However, this also probably doesn't work; if the data-producer had thought about the consequence of misinterpretation, they would have tried to avoid inconsistency anyway.

Therefore I support the change to remove this assumption, and state that the metadata is invalid if grid_mapping and crs_wkt are inconsistent. In response to Karl, I agree with Alan @snowman2 that this is an error, and the file is not compliant, because the convention states the two kinds of metadata must be consistent.

Unfortunately, the CF checker won't be able to detect this error unless we write down the mapping between grid_mapping attributes and crs_wkt in the conformance document (or some document it can refer to). To make the check, you have to be able to interpret both kinds and compare them. One of Alan's concerns was that data-users felt they had to do that. We agreed that they don't. They can read one or the other, assuming they agree. However, it would be good if this could be checked routinely. As I argued before, I strongly feel that it would improve the convention if we could write down the equivalence. Note that we don't have to consider all aspects of WKT, but only those aspects which people want to write in CF-netCDF files. Doing this would cause us to identify what has to be added to grid_mapping to give it the required capabilities, and whether there are inconsistencies between the CF data model and WKT. I wouldn't be surprised if there are, and we need to know, because it's not safe to treat crs_wkt as a black box if it might conflict with other CF metadata (not just in the grid_mapping, but for instance in the units and standard name of coordinate variables).

I would be concerned about adopting Philip @cameronsmith1's suggestion, because I fear that might lead to data-producers not being so careful with one or the other of the representations, thinking that they could set the flag to indicate it's not to be trusted.

Jonathan

JonathanGregory commented 4 years ago

This issue doesn't have a moderator - I think that's why it's not progressed. I will moderate it. @snowman2's current proposal (https://github.com/cf-convention/cf-conventions/pull/282) is to replace

However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence.

with

If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately.

I think this is OK, except for the last sentence, which I think should be

For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis disagrees with the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately.

That leads naturally to the unaltered final sentence, "Naturally if the two values are equal then no ambiguity arises."

Philip @cameronsmith1 and Karl @taylor13, are you content with this? Alan @snowman2, is my amendment OK with you?

Jonathan

taylor13 commented 4 years ago

Yes, I think the intent of this is fine.

I don't recall the text that precedes the revised text, but should the first sentence read:

"If, for a given property, both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible"

Also, is the second clause dependent on the first clause, or in general is it true that "grid mapping parameters should always be completed as fully as possible." If it is generally true, I think the second clause should be its own sentence (and maybe it should appear elsewhere?).

snowman2 commented 4 years ago

Minor tweak:

For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis disagrees with the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately.

taylor13 commented 4 years ago

@snowman2 My poor brain can't seem to detect what exactly was tweaked. Could you please point to the specific change made?

taylor13 commented 4 years ago

Oh, I just saw the crossed out "is". Guess I mistook it for dust on my monitor. Sorry.

cameronsmith1 commented 4 years ago

I am OK with this (as amended).

JonathanGregory commented 4 years ago

Thanks for correcting my typo, @snowman2. I agree with Karl @taylor13's second point. It is a general statement. However, I think we're introducing unnecessary repetition. I appreciate that the modified text is in the pull request, but our guidelines are that we should discuss it as far as possible in the issue, so there's only one place to look to see the discussion. So I'm repeating the whole of paragraph and the previous one for context. I propose minor deletions for conciseness and to reduce repetition. How's this:

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the compound crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property attributes as well as by crs_wkt.

In cases where CRS property values can be represented by both a single-property grid mapping attribute and the crs_wkt attribute, the grid mapping should be provided, and if both are provided, the onus is on data producers to ensure that their property values are consistent. Therefore information from either one (or both) may be read in by the user without needing to check both. However, if the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis disagrees with the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. Naturally if the two values are equal then no ambiguity arises.

Jonathan

snowman2 commented 4 years ago

Sounds like a reasonable change to me.

Minor tweaks:

compound crs_wkt attribute

I am thinking the "compound" word could be removed.

the ellipsoid is defined by

I am thinking the "is" should be removed.

JonathanGregory commented 4 years ago

Thanks, @snowman2. I agree with both of those changes. Please could you update your pull request so it's the same text as above (with those two changes)?

If Karl @taylor13 and Philip @cameronsmith1 think that's OK still, we can count them as supporters, which means the proposal meets the conditions for acceptance. It will be accepted three weeks from now (3rd August) if there are no further concerns raised before then.

snowman2 commented 4 years ago

Sounds good, just updated the PR (https://github.com/cf-convention/cf-conventions/pull/282/commits/459e514b35f3e9e81d278d016a16d106e622623a). (Note: just edited commit hash).

taylor13 commented 4 years ago

Yes, count me as a supporter. Thanks.

JonathanGregory commented 4 years ago

Thanks, Karl

cameronsmith1 commented 4 years ago

These changes look good to me.

JonathanGregory commented 4 years ago

There have been no further comments in the last three weeks and the required level of support has been reached so the proposal is accepted according to the rules. I have merged the pull request https://github.com/cf-convention/cf-conventions/pull/282/. Thanks, Alan @snowman2

snowman2 commented 4 years ago

Thanks for assisting getting this proposal accepted @JonathanGregory 👍