cf-convention / cf-conventions

AsciiDoc Source
http://cfconventions.org/cf-conventions/cf-conventions
Creative Commons Zero v1.0 Universal
85 stars 43 forks source link

Allow CRS WKT to represent the CRS without requiring reader to compare with grid mapping parameters #222

Closed snowman2 closed 4 years ago

snowman2 commented 4 years ago

Title: Allow CRS WKT to represent the CRS without requiring reader to compare with grid mapping parameters Moderator: ???
Moderator Status Review [last updated: YY/MM/DD]: ??? Requirement Summary:

I propose the requirement be changed like so:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. , then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

Benefits:

  1. The CRS could originate from several different formats such as WKT, PROJ, or SRS Authority Code. If there are errors in the conversion process to the CF or WKT representation, only the provider would have the original CRS representation. As such, if there are conflicts, the provider would be the best source to go to in order to resolve the conflicts.
  2. Making this change will simplify the lives of software developers so they can just read in the WKT or grid mapping CF parameters for the CRS without a need to compare the two.

Status Quo: http://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format mentions

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.
JonathanGregory commented 4 years ago

The status quo (giving the CF attributes precedence over WKT) was discussed at great length when the possibility of including WKT strings was added. I have not reviewed that discussion but it would be relevant to do so to avoid repeating it! It's in https://cf-trac.llnl.gov/trac/ticket/69 and https://cf-pcmdi.llnl.gov/trac/ticket/80. I opposed the introduction of WKT strings because I didn't like redundancy, which would probably lead to inconsistency, but I agreed with the resolution that we have, in which the CF attributes take precedence.

Without reviewing the previous discussion, these points occur to me:

In view of these points, I don't think this proposal is the best way to proceed. Instead, if there are elements of the CRS that can't currently be represented in CF but are needed, we should consider adding them, as we have done before (your points 1 and 3). If the equivalence between CF and WKT is unclear or incomplete (related to my first point above) it should be improved (your points 2 and 4).

snowman2 commented 4 years ago

I am a GDAL/PROJ user, so from my biased perspective life would be much easier from the WKT form :). Additionally, since WKT is already a standard from the OGC geospatial community, most geospatial software should be able to support it.

The WKT model of metadata is different from the CF one. ... That would be a big job.

Correct. That is why I propose the CRS WKT take precedence. The CF grid mapping parameters only provides support for a limited subset of projection parameters. (Ref: https://cf-trac.llnl.gov/trac/ticket/69):

3.2. Because the conceptual model for coordinate reference systems is both large and complex it is considered impractical to devise CF attributes for all of the potential CRS properties which might need to be encoded as metadata attributes in netCDF files. Consequently there is a requirement for such CRS properties to be specified in a compact notational format, preferably a format that is already in widespread use, either as a de facto or de jure standard.

So, in this proposal, if the CRS WKT exists and can be read in, the CF projection parameters should be ignored entirely and no checks made between the two. However, the CF projection parameters are there for both backwards compatibility as well as for programs that do not support the WKT form of the projection.

To give WKT precedence, we would have to require all CF-compliant applications to be able to parse WKT. That's a big expectation, which I think is unrealistic in practice.

I should clarify that in this proposal that CRS WKT can remain optional. However, when it does exist and your program can read it in, I propose that it should take precedence. As a side note, with the GDAL Barn changes (https://gdalbarn.com/), reading in a WKT is much more practical with PROJ as a dependency. It also provides support for WKT2. Additionally, GDAL can easily support the WKT form of the projection which enables all the dependent software to read in the projection.

snowman2 commented 4 years ago

Here is the WKT2 form of the British National Grid from: https://cf-pcmdi.llnl.gov/trac/ticket/80

>>> from pyproj import CRS
>>> cc = CRS("OSGB 1936 / British National Grid")
>>> cc
<Projected CRS: EPSG:27700>
Name: OSGB 1936 / British National Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E
- bounds: (-9.2, 49.75, 2.88, 61.14)
Coordinate Operation:
- name: British National Grid
- method: Transverse Mercator
Datum: OSGB 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich

>>> print(cc.to_wkt(pretty=True))
PROJCRS["OSGB 1936 / British National Grid",
    BASEGEOGCRS["OSGB 1936",
        DATUM["OSGB 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4277]],
    CONVERSION["British National Grid",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996012717,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E"],
        BBOX[49.75,-9.2,61.14,2.88]],
    ID["EPSG",27700]]

The coordinate system and area of use currently don't have an equivalent in the CF conventions. The coordinate system is important to note as the axis order is taken into account in PROJ 6+ and GDAL 3+.

dblodgett-usgs commented 4 years ago

Dear @snowman2 --
I agree with @JonathanGregory, that if things are missing from CF that are in WKT, they should be added.

Maybe the core of your proposal is actually best made to the GDAL / PROJ project to modify default behavior when working with CF data? When different, a warning could be issued and the WKT used with preference?

Regards - Dave

rmendels commented 4 years ago

@snowman2 @dblodgett-usgs @JonathanGregory Second what Dave says. The argument seems to be to break everything in the CF world so that GDAL will work better with netcdf files. Why not improve GDAL (supposedly based on a talk I heard the new, I believe yet to be released version indeed does have better support).

GDAL is a great library and a lot of work has gone into it, but its netcdf support has always been sketchy . When I first was directed to it years ago, it could only do 2-D files, and would flip the data, even when the metadata clearly said the axes went in the other direction (it just ignored the metadata attributes). That problem lasted for a long time (for all I know it still does this). GDAL has had problems with greater than 3-D files, forecast files, DSG files, files that are part of the NCEI examples for sending in data, some issues with time, and some of the newer features in netcdf4 files.

Things that can improve CF are most welcome, things that would potentially break most present CF based software should have to make an awfully strong case for the benefits.

snowman2 commented 4 years ago

Thanks all for the comments! My desire here is to unite the geospatial (OGC) and CF-conventions here to simplifying things when transitioning between the communities.

Much of the inspiration for this thought came when attempting to match PROJ parameters to the CF conventions as documented here. There are several parameters that do not match up and in several cases a grid mapping does not exist.This is problematic for users who wish to convert back and forth between the two. However, since PROJ supports reading in the WKT string, the full CRS can be properly represented in that manner and no information is lost. Additionally, the PROJ FAQ strongly discourages the use of PROJ strings to represent the CRS and instead recommends using the WKT string.

Maybe the core of your proposal is actually best made to the GDAL / PROJ project to modify default behavior when working with CF data?

This would indeed be problematic and confusing for users of GDAL to change this as the behavior would differ from the CF spec. This is already done in pyproj and an issue exists due to users noting that the behavior differs from the spec here.

Alternative proposal?

Thoughts on stating in the spec that if the CRS cannot be properly represented using the CF grid mapping parameters, that the CRS WKT form is recommended as a fall back (noting of course that this may not be compatible with some software)? Also, it would be good to note for users to make an issue in this repo with their CRS WKT that cannot be represented using the grid mapping parameters so the CF spec can be updated accordingly.

rmendels commented 4 years ago

Would point out that the newest version of Proj4 in its latest incarnation just introduced a bunch of changes in how things are done, and in the CRS. See some of the discussion related to the R packages sp and sf. It is breaking a lot of software.

erget commented 4 years ago

I agree with @dblodgett-usgs and @JonathanGregory - we already have a clear hierarchy that establishes which values have precedence over which ones in the case of conflict. Data producers already have the possibility of omitting CF attributes in favour of using WKT, although this is discouraged. I would see this as an acceptable solution if one wanted to produce data now and the relevant parameters weren't supported by CF. Optimally, one would pursue the adoption of the needed parameters in CF in parallel.

rsignell-usgs commented 4 years ago

This is a bit of a sidebar, but one thing that would make it easier for people EPSG and WKT folks to create the CF representation would be if we could get the friendly folks over at spatialreference.org to supply the CF representation. If I google "EPSG 4326", I end up at https://spatialreference.org/ref/epsg/wgs-84/ Opera Snapshot_2019-12-31_074229_spatialreference org which provides the WKT representation in several flavors as well as other representations. Why not one more for CF? This would also be a good way to figure out what is missing in CF...

snowman2 commented 4 years ago

@rsignell-usgs, that would definitely be nice. However, it will also require a lot of work, so I imagine some kind of funding would be needed. It would also be a nice feature to have in PROJ, but I assume it will require funding as well: https://github.com/OSGeo/PROJ/issues/1193.

marqh commented 4 years ago

Hello @snowman2

this is an interesting topic and I am grateful that you have raised it

I think there are some fine details that are being picked out here that are interesting, as well as the big picture.

Whilst the big picture comes with a lot of considerations, there are small scale benefits we can try to get to.

One example stands out for me from your comments:

The coordinate system is important to note as the axis order is taken into account in PROJ 6+ and GDAL 3+.

I have also been looking at the axis order with respect to CRS-WKT. I agree that this is important.

I think that there is an in situ feature that can be extended to provide some extra clarity on this topic.

With this in mind, I have opened a new issue

223

to discuss this as an isolated topic, to see if there is a quick and easy extension that would address this concern.

I very much support the broader scope discussion on this topic, hence my approach to separate out #223 so that the discussion on that targeted topic does not get in the way of these valuable considerations.

I hope this is a useful step mark

marqh commented 4 years ago

This is a bit of a sidebar, but one thing that would make it easier for people EPSG and WKT folks to create the CF representation would be if we could get the friendly folks over at spatialreference.org to supply the CF representation. If I google "EPSG 4326", I end up at https://spatialreference.org/ref/epsg/wgs-84/

Hi @rsignell-usgs

I'm afraid that Google may be somewhat unhelpful with its advice

the resources at https://spatialreference.org are not very well maintained, and the process of maintenance has been far from clear for some time: https://spatialreference.org/about/

The EPSG maintain the official registry for EPSG codes, providing URI and URN notation for encodings, e.g. https://www.epsg-registry.org/export.htm?wkt=urn:ogc:def:crs:EPSG::4326

Comparing this resource to https://spatialreference.org/ref/epsg/wgs-84/ogcwkt/ it is clear (to me) that the spatial reference resource have not adopted the updated CRS-WKT syntax, which is implemented in GDAL, ESRI and other modern packages.

At present the only well maintained resource for EPSG codes in WKT encoding that I am confident of using is https://www.epsg-registry.org/

all the best mark

marqh commented 4 years ago

On the detail point of the proposal, I would support amending the current text:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

To remove the latter precedence statement.

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

It is my view that there is too much of an onus placed on the data consumer here, to parse both content representations, map terms to one another and interpret outputs. This is complicated and difficult to implement. There are many opportunities for mistakes and problems.

If there is WKT in a file, I want my application to trust it, not to have to parse it to look for mistakes. If i can just parse it then I can delegate this to a supporting application, which is great for maintainability.

I think that placing the onus on the data producer to produce content that they assert is consistent is sufficient.

I think the value of data consumers being able to simply parse the WKT directly is very large.

I think the cost of managing the assertion of consistency on data producers is much smaller. In a sense the status quo is standardising for mistakes in encoding, which i don't think the standard should do, especially given the cost here.

all the best mark

graybeal commented 4 years ago

I read both #69 and #80, and was startled by the sudden acceptance of these tickets after such long discussion of possible issues. (Credit here to Jonathan for flexibility!) Many of those issues are raised in this context, but this ticket proposes WKT be dominant in a much narrower sense (see detailed item (b) below).

I agree strongly with @margh's recent points, including the large value of data consumers being able to simply parse the WKT directly. It's key to recognize this is an augmentation, not a restriction. My detailed reasons follow, but first, I think the phrasing at the beginning of the proposal is creating unneeded alarm.

Despite the misleading title, the proposal doesn't make WKT dominant, it just makes it directly usable (but still secondary, because the WKT is not required). I offer this as an equivalent rewrite of the proposal's first paragraph:

I propose that if a CRS WKT is present and can be used by the software program, that the WKT should be allowed to stand alone as an official CRS of the file by CF standards (thus, implicitly ignoring non-WKT CRS parameters). However, non-WKT CRS parameters still must be present to serve as an official representation of the CRS, in the event the software program cannot read in the CRS WKT or chooses not to use it.

In the text you wouldn't say anything like this of course. The text already describes how WKT is an optional augmentation, and that the non-WKT CF must be as complete as possible. I'd only tweak one line, just before the paragraph marqh highlighted, by replacing "as well as by crs_wkt" with "even if a crs_wkt is present", so now it reads:

Therefore the CRS should be described as thoroughly as possible with the single-property attributes, even if a crs_wkt is present.

With the proposed precedence deletion of marqh (item (c) below), I believe this fully captures the intent of the proposal.

Detailed responses to a few points:

(a) Yes ideally CF could be equally capable. On the other hand, WKT will continue to improve and many tools are and will be built around it. Does CF want to take on the job of "keeping up with WKT" and expect tool developers to "keep up with the CF version of WKT"? Even if we want that to happen, who in CF wants to volunteer to make it happen for CF? And in the tool community? (b) This seems to be a graceful co-habitation strategy. I don't see how this "breaks everything in the CF world"—it is adding a capability to CF, not breaking anything that exists. If the data creator doesn't add WKT, it doesn't apply. If the tool reading the file doesn't support WKT, it doesn't apply. Everything presented in CF will continue to work with all the tools written for CF. If I'm trying to create CF-compliant data, even if my WKT adds critical value I will want to make my CF as complete and accurate as I can for non-WKT applications, and the CF parameters are still required by the convention. (c) Supporting the proposed precedence deletion of marqh: If the added WKT does not align with the CF, the data creator has introduced a bug. This can be corrected by social pressure (as is usually the case for any mistakes in the data) and does not require custom text in CF defining the "true meaning" (independent of the originator's intent). The tool creator is motivated to make their tool maximally useful given available time, including whether to favor the WKT expression and whether to cross-check the two expressions. Co-existence works here also and does not damage CF (because the CF parameters still have to be there).

I'm assuming positive assessments of the prevalence of WKT, its features, and its community support for upgrades. If you agree these are favorable indicators, then there are two ways to consider the options. (1) How good will this be for existing CF users going forward? Although maybe not many of them need WKT yet, it will be favorable on balance, with little or no downside that I can see. And more broadly, (2) How much will this encourage/allow the geospatial community to easily adopt and use CF? I think it will be quite encouraging.

snowman2 commented 4 years ago

@graybeal, thanks for clarifying! I used your clarified version as I think it does a much better job of capturing the intent of the proposal.

JonathanGregory commented 4 years ago

Although I'm watching this repository, and I contributed to this thread, GitHub has sent me only one of the contributions to this issue, namely the most recent (before this one, 10 h ago by @snowman2). Shouldn't I receive all of them by email? I depend on email to be informed that some discussion is taking place.

JimBiardCics commented 4 years ago

A few comments on the discussion to this point. I think the discussion is moving in the overall right direction. If seems to me that there was confusion at first between implementations and uses on the one hand and design and conventions on the other. I think we need to seriously consider how big a job it would be to "re-invent the wheel" by trying to add to CF, even piecemeal, all the parameters needed to represent all coordinate reference systems (CRSs). The vast majority of us are not geodesists. We need to acknowledge that this is a significant discipline that we know little about, and allow the experts in that field to be the experts. Let's use the standards they have developed rather than build an inferior substitute.

CF added the ability to specify a few projected coordinate systems. We clearly must continue to honor those for backward compatibility purposes, but let's not add any new ones. I think we should encourage the use of WKT CRS declarations going forward and focus on what might need to be added to CF to resolve ambiguities that might be present. If I understood correctly, @JonathanGregory thought there were possible issues. I didn't see any specifics given, but I'd rather try to clear those up than follow a "make our own" approach any longer.

I've worked with a few data providers that attempted to add grid_mapping variables to their netCDF files. The majority of them botched it. They would have been much better off if they could have copied and pasted a WKT string rather than try to figure out how to read CRS definitions and map elements to CF grid_mapping attributes.

dblodgett-usgs commented 4 years ago

Great strategy @JimBiardCics. Having contributed an implementation to map CF conventions to WKT in R -- I know how error prone and hard it can be. Moving toward support of WKT as a fully fledged option within CF is unambiguously a good thing in my mind.

@marqh's suggested text changes make a ton of sense to me.

Should we also add something that emphasizes the points about "graceful co-habitation" ?

snowman2 commented 4 years ago

Should we also add something that emphasizes the points about "graceful co-habitation" ?

Are you thinking something along the lines of:

"If both a CRS WKT and grid mapping parameters exist, it is assumed that they are equivalent. As such, either one may be used to represent the CRS of the file."

graybeal commented 4 years ago

Or to deal with the edge cases and be consistent with our expectations:

"If both a CRS WKT and grid mapping parameters exist, it is assumed that they do not conflict. As such, information from either one (or both) may be used to represent the CRS of the file, recognizing that the grid mapping parameters should always be completed as fully as possible."

snowman2 commented 4 years ago

One minor addition: "If the CRS cannot be represented using the grid mapping parameters, using only the CRS WKT is allowed. However, some applications will not be able to read in the CRS WKT form."

JimBiardCics commented 4 years ago

@snowman2 Are there any applications that actively read in and use the CF grid mapping parameters?

snowman2 commented 4 years ago

@snowman2 Are there any applications that actively read in and use the CF grid mapping parameters?

The only application I am aware of that does so is GDAL. However, it also checks for the WKT string and compares the two at present. I am not sure about other applications, but I assume there are based on the current cf-conventions. :man_shrugging:

JimBiardCics commented 4 years ago

GDAL is one more than I was aware of. I'm not aware of any others.

rmendels commented 4 years ago

I would suggest maybe going a little slow on this one, because my guess is many people are not following this discussion, so they will not be able to respond. I have no idea what for example Panoply does, the Coastwatch tools, Thredds, Seadas, and some others I can think of, and we should be very careful we don't needlessly break things. That is why I objected to the original proposal where the WKT string would take precedence. That would almost require that WKT strings be supported in software.

pp-mo commented 4 years ago

Are there any applications that actively read in and use the CF grid mapping parameters?

Not sure if I'm answering the right question here, but Iris definitely does explicitly interpret CF grid-mapping terms : we have explicit code for translating various types of grid-mapping to an [Iris 'coordinate_system']().
Currently supporting types :

albers_conical_equal_area
azimuthal_equidistant
lambert_azimuthal_equal_area
lambert_conformal_conic
lambert_cylindrical_equal_area
latitude_longitude
mercator
orthographic
polar_stereographic
rotated_latitude_longitude
stereographic
transverse_mercator
vertical_perspective

Most of these can also be produced as output.

For instance, we are currently working on making let Iris accept a geostationary grid with missing false_easting/false_northing parameters : https://github.com/SciTools/iris/pull/3628

We currently don't support the WKT text, but could presumably consider this in future.

dopplershift commented 4 years ago

MetPy is reading these grid_mapping attributes and converting them for our purposes of plotting in CartoPy. netCDF-java (cc @lesserwhirls) reads them and uses them to set up coordinate systems.

snowman2 commented 4 years ago

Just to be clear, this proposal does not propose to remove or override the grid_mapping parameters. It's main goal is co-existence with the CRS WKT.

Proposed WKT string statement modifications (modifications in italics):

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the compound crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property attributes as well as by crs_wkt. If the CRS cannot be represented using the grid mapping parameters, using only the crs_wkt attribute is considered valid.

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both a crs_wkt and grid mapping attributes exist, it is assumed that they do not conflict. As such, information from either one (or both) may be used to represent the CRS of the file, recognizing that the grid mapping parameters should always be completed as fully as possible. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

rmendels commented 4 years ago

_If the CRS cannot be represented using the grid mapping parameters, using only the crswkt attribute is considered valid.

Sorry, but to my untrained eyes this essentially means software that purports to be CF compliant will have to be able to deal with WKT. It is very simple, I have a transformation that can't be represented by present grid mapping parameters, I use the crs_wkt attribute, based on the standard that says the file is CF-compliant but a lot of software that has been the basis for CF-compliance will not be able to properly read that file. I must be missing something here because I don't see anyway around this. It will create CF-complaint files that present software will not be able to properly read. Which will mean either a lot of incompatible files or possibly a lot of work for the authors of the present software that are the backbone of CF-compliance.

I am also concerned that not enough feedback has been coming from the authors of these software. Most telling was the inability of the people who have driven this to name such software (so that they are not aware of the possible implications of these changes), and even more that this is being driven to benefit GDAL so that GDAL doesn't have to change (read the original posts).

I want to reiterate that it is not that I don't think going in this direction in the long-run is a good idea (that is a separate issue), I just don't think the implications have been entirely thought through, in particularly the ramifications for software developers who may not have the time and resources to deal with the changes. Which is why I keep saying we should go slow on this , perhaps create a lot of sample files, create ones that use CRS that presently can't be properly represented, and see what kind of havoc it does or does not create. Only after testing over as many programs as possible, then see how this should be worded.

davidhassell commented 4 years ago

I broadly agree with @rmendels (https://github.com/cf-convention/cf-conventions/issues/222#issuecomment-572780372) and @JonathanGregory three points (https://github.com/cf-convention/cf-conventions/issues/222#issuecomment-569325165). This is not a bad idea, but it is also not one that has a trivial implementation, and so would need some work before it should be accepted. My interest is a general one, but also as developer of CF-compliant software: cfdm and cf-python. These libraries currently ignore WKT. I would be happy to so extend them so that they did, but that is probably not a 5-minute job, as there first need to exist the well-defined mapping between those parts of WKT that correspond to CF parameters.

Expanding on one of the aforementioned points:

Thanks, David

graybeal commented 4 years ago

There are two perspectives in this thread (and the original WKT discussions) about what it means for optional 'containers' like WKT to be acknowledged as 'valid' in CF. In one view, the acknowledgement puts some or all of the responsibility for validating the container's content on CF, the CF validation software, and the CF tool developers. In the other view, it can be treated as a black box, with the responsibility for correctly including WKT content, verifying WKT content, and using WKT content relegated to the actual WKT providers and users.

I think the acceptance of WKT as an 'allowed' container in those original proposals implicitly adopted the second view. To my understanding no tests for WKT content validity have been added, and tools have not been required to change anything. The gist of the proposal seemed in line with that view, because WKTs would still be optional and the CF attributes would still be required to the maximum extent possible. Therefore, I think the best approach is to modify the proposal to make it fully consistent with the second view, because that lets CF dynamically and quickly adapt to advanced needs of an important community.

_If the CRS cannot be represented using the grid mapping parameters, using only the crswkt attribute is considered valid.

Sorry, but to my untrained eyes this essentially means software that purports to be CF compliant will have to be able to deal with WKT. It is very simple, I have a transformation that can't be represented by present grid mapping parameters, I use the crs_wkt attribute, based on the standard that says the file is CF-compliant but a lot of software that has been the basis for CF-compliance will not be able to properly read that file. I must be missing something here because I don't see anyway around this. It will create CF-complaint files that present software will not be able to properly read. Which will mean either a lot of incompatible files or possibly a lot of work for the authors of the present software that are the backbone of CF-compliance.

I agree that the quoted sentence is ambiguous and inconsistent with other text. I'd prefer it went away; or at least be expressed as "If the CRS cannot be fully represented using the grid mapping parameters, the additional crs_wkt attribute can be used to augment the representation (while noting that support for reading and using the crs_wkt attribute in CF software remains entirely optional)."

Still, there are oodles of CF-compliant data files that present software can't "properly" read, because they have various optional augmentations in them not understood by all software. The fact this particular optional augmentation is mentioned explicitly does not mean everyone, or even anyone, has to support it. It also does not mean that any testing or conformance software has to support or test WKT, any more than allowing multiple conventions in the convention attribute means you have to support and test all of the conventions that might be referenced by that attribute.

We would have to revise (or consider revising) CF whenever the definition of WKT was amended.

I'm not sure why. The only thing the WKT content can affect is another CF tool that reads WKT. All the existing tools that ignore WKT should not be impacted, they'll ignore WKT no matter what its definition is. Now, the tools that use WKT in CF might have to decide how to deal with WKT versions, but they could do that without ever having to require changes to the CF specification.

steingod commented 4 years ago

I think the acceptance of WKT as an 'allowed' container in those original proposals implicitly adopted the second view. To my understanding no tests for WKT content validity have been added, and tools have not been required to change anything. The gist of the proposal seemed in line with that view, because WKTs would still be optional and the CF attributes would still be required to the maximum extent possible. Therefore, I think the best approach is to modify the proposal to make it fully consistent with the second view, because that lets CF dynamically and quickly adapt to advanced needs of an important community.

I think this is a beneficial approach.

JimBiardCics commented 4 years ago

@graybeal I am also in favor of the "black box" approach to WKT. We should accept the work of the experts that develop WKT and define CRSs using it, and trust the registered WKT strings for different CRSs. Someone can always produce an incorrect WKT CRS string, but someone can also produce an incorrect grid mapping variable. The CF checker doesn't try to verify the exact contents of grid mapping variables, so I don't see why we should involve ourselves with verifying the exact contents of WKT CRS strings.

(Mild rant alert)

The great majority of CF files don't contain grid mapping information. A number of those that do contain incorrect information — either improperly chosen attributes or incorrect values for the attributes. (Based on the small sample of datasets that I've QC'd, most of the producers who add grid mapping get it wrong.) Tools that are able to use grid mapping attributes and attempt to use any of these files will not produce "best" results, and will in some cases produce crazy results. I am therefore not overly swayed by the appeal to issues for existing software. Existing tools that use grid mapping variables do so because that was the mechanism which was provided. If we had provided WKT or Proj strings from the beginning, those are what existing software would be using.

I confess that I haven't done the work to prove it, but I believe that the CF grid mapping attributes are insufficient to fully represent most CRSs, even the ones that we claim to support. They are sufficient if your spatial accuracy and precision is rougher than a kilometer or so, but if you are are concerned about meter-level (or better) accuracy in all three spatial dimensions, I don't think we support enough of the attributes needed to do so.

I think we should be encouraging software tool developers to embrace WKT (and/or Proj) CRS declarations instead of viewing them as second-class citizens in CF-land. I'd love to see a day arrive when we decided to deprecate grid mapping attributes for new files.

Dave-Allured commented 4 years ago

Moving toward WKT would be facilitated by a few software modules that interpret WKT and translate to familiar CF grid_mapping parameters, when reasonably compatible.

@snowman2 wrote:

with the GDAL Barn changes (https://gdalbarn.com/), reading in a WKT is much more practical with PROJ as a dependency. It also provides support for WKT2. Additionally, GDAL can easily support the WKT form of the projection which enables all the dependent software to read in the projection.

"enables all the dependent software to read" -- Which language API's are supported or planned for this? Are there known efforts or interest to develop some WKT-to-grid_mapping translation modules?

snowman2 commented 4 years ago

PROJ is in c/c++. Currently available bindings for other languages: https://proj.org/development/bindings.html

snowman2 commented 4 years ago

Are there known efforts or interest to develop some WKT-to-grid_mapping translation modules?

I brought this up in the past: https://github.com/OSGeo/PROJ/issues/1193. However, I think it would require more interest in the community and/or funding to make it happen.

I have a basic implementation based on the PROJ strings mappings in pyproj:

However, the mapping back and forth is imperfect and misses quite a bit without reading/writing the crs_wkt. I have plans to update grid mappings based on WKT as that will allow a more complete mapping. PROJ string mappings code here.

JonathanGregory commented 4 years ago

I'm not contributing much to this because GitHub is inexplicably not sending me the contributions to this issue. It may have decided for itself that it's better for all if I'm kept in the dark!

I feel that the best solution for this would be for CF to maintain a document which describes the mapping between CF metadata and WKT, and for conformance to this mapping to be checked by the CF-checker. In that way a user could be reasonably confident of getting the same result by reading either of them, and errors would be identified. I think we already have some documents somewhere that are relevant (prepared years ago by Etienne Tourigny).

This would be a non-trivial piece of work but until it is done I don't think we can really know what we are missing in the CF attributes what WKT can describe and needs to be included in use cases of CF datasets. It could be that CF grid_mapping attributes could eventually be replaced by WKT, as Jim would prefer, but I suspect that there is overlap with other parts of the CF data model, which would make this hard. It's certainly worth considering, but I think that writing down the mapping is a necessary piece of evidence in deciding about that and would therefore be a useful first step.

snowman2 commented 4 years ago

@JonathanGregory whenever you see this :smile:, I would recommend checking the right hand side of this issue at the top to see if your subscribed to notifications: image

snowman2 commented 4 years ago

I think that writing down the mapping is a necessary piece of evidence in deciding about that and would therefore be a useful first step.

Here are some examples of the WKT version for the CF grid_mappings for the conversions: https://pyproj4.github.io/pyproj/latest/_modules/pyproj/crs/coordinate_operation.html

This may also be useful: https://pyproj4.github.io/pyproj/latest/build_crs.html

rschmunk commented 4 years ago

@rmendels said

I have no idea what for example Panoply does, the Coastwatch tools, Thredds, Seadas, and some others I can think of, and we should be very careful we don't needlessly break things.

Panoply uses the netCDF-Java library to read datasets. When dealing with projected grids, it looks for the variable named by the grid_mapping attribute and the CF specified projection parameters of that variable. For the 13 projections that Panoply supports, in 4 cases it uses projection code that is part of NJ but in the other 9 it uses its own projection code to transform the grid. The reason for those 9 is that I'd already implemented them for other purposes, and the NJ library doesn't do the best job of performing sanity tests on the projection attributes that it finds in the dataset.

FWIW, I only started looking through this thread because an issue regarding WKT was opened at Unidata/netcdf-java#191 earlier today.

snowman2 commented 4 years ago

This would be a non-trivial piece of work but until it is done I don't think we can really know what we are missing in the CF attributes what WKT can describe and needs to be included in use cases of CF datasets.

@JonathanGregory, this is currently supported in pyproj master here

You can see the tests that were run to check the conversion to/from the CF format of the projection here.

I didn't find any CF parameters that weren't able to be supported by WKT. However, there are many projections that WKT can describe that aren't supported by CF. There is an extensive list here for projections supported by PROJ. Only a subset is supported by CF conventions.

snowman2 commented 4 years ago

I am removing the part about the standalone WKT to be discussed in another issue.

Proposed WKT string statement modifications (modifications in italics):

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both a crs_wkt and grid mapping attributes exist, it is assumed that they do not conflict. As such, information from either one (or both) may be used to represent the CRS of the file, recognizing that the grid mapping parameters should always be completed as fully as possible. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

graybeal commented 4 years ago

First two sentences are duplicated in the second two sentences?

snowman2 commented 4 years ago

Good catch! Just fixed.

JonathanGregory commented 4 years ago

I continue to think that if we want to allow grid_mapping attributes and WKT to have equivalent status, we should write down the mapping between them, so it can be checked automatically. Thanks for pointing out the work on this in pyproj - is that your work? Without being able to do this check, I fear that some data-writers who are solely interested in one representation and not familiar with the other will very likely make mistakes and the file will be incorrect. I think this is too unsafe for us to rely on a stated obligation to get it right.

If you have use cases for actual datasets that you want to put in CF but the required projections are not supported in grid_mapping, it would be useful to propose adding them to CF.

By the way, my problem with not receiving all CF GitHub posting turned out to be that gmail thought some of them were spam, but it wasn't consistent about that - it objected to some comments and not others.

rmendels commented 4 years ago

+1 On all points

As I have said before, it is not that this isn't a good idea, but why not take the time to do it correctly with minimal impacts.

rsignell-usgs commented 4 years ago

@rschmunk and @lesserwhirls, could panoply and netcdf-java use the Java geotools library to handle CRS WKT?

Would that be straightforward or a major effort?

lesserwhirls commented 4 years ago

@rschmunk and @lesserwhirls, could panoply and netcdf-java use the Java geotools library to handle CRS WKT?

Would that be straightforward or a major effort?

It would introduce a somewhat interesting circular dependency (they are using netCDF-Java 4.6.11, we're on 5.2.0). I've been looking at proj4j to see what they have for parsing WKT, at which point we'd need to setup a mapping to CF parameters and create a netCDF-Java Projection object. We have some (what looks to be) contributed code that does the parsing, mapping, and projection object creation, but it's not very forgiving in its parsing and not super capable at this point.

JimBiardCics commented 4 years ago

@JonathanGregory said

If you have use cases for actual datasets that you want to put in CF but the required projections are not supported in grid_mapping, it would be useful to propose adding them to CF.

I personally have a number of cases where CF doesn't provide the projection used. I come back to the question of why we feel the need to re-invent the wheel. I believe we need to learn to coexist with other standards and conventions that are at least as stable and reliable as CF rather than replicate them.

JonathanGregory commented 4 years ago

Dear @JimBiardCics

I appreciate the point and I agree that we should not spend our time doing a less good job of something which is already available elsewhere. However I don't think that CF is fairly described as "reinventing the wheel" in this case. I can suggest a few reasons why we have our own set of grid_mapping definitions rather than relying on proj4 for example.

Therefore instead of "handing over" something to an external authority, I feel that in most cases it is better to import the external information to CF. This isn't an absolute rule, of course, but it seems fine for grid_mapping to me. However, your point is a good reason why we should make the choice of attributes as similar as we can to proj4. If we do that, it should be easy to import more of them, like new standard names on existing patterns, and it improves interoperability between conventions. If we find there is a use case which is not trivial to import, there's probably a good reason for that which will require some thought, like new standard names which propose new patterns.

Best wishes

Jonathan