SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
626 stars 283 forks source link

Support loading/saving netCDF with multiple coordinate systems #3388

Open lbdreyer opened 5 years ago

lbdreyer commented 5 years ago

In CF-1.7 the use of the grid_mapping attribute was expanded to store multiple coordinate systems. See example 5.10 Ref (for background, if needed): trac ticket 70

We should handle reading and writing files of these types of files.

lbdreyer commented 4 years ago

Agreed to delay with til Iris 3, as it is no longer such a high priority for the user requesting this

pp-mo commented 4 years ago

Note the relevance of #3796.
It probably makes good sense to do this first. However, as noted there, the feature described in #3796 can affect the implementation of this in some cases. So it makes sense to bear that in mind.

pp-mo commented 1 year ago

Just found that this is also has unexpected relevance to UGRID encoding.

So... for regular structured data, with X+Y dimensions and 2 separate X+Y coordinates, where the data is sampled on a grid which is not true-latlon based,

  1. data-variable has a "grid_mapping" attribute, specifying a coordinate system, and
  2. the X+Y coords will have the appropriate standard_names (i.e. "grid_lat(long)itude" OR "projection_x(y)", instead of plain "lat(long)itude". Also
  3. optionally, there may be true-latlon equivalent coordinate information. this will consist of variables called plain "lat(long)itude", which will be 2D (Y,X), but it is strictly redundant if there is a grid mapping (@). (@) > "Such coordinates may be provided in addition to the provision of a grid mapping variable, but that is not required"

For example (using a rotated pole grid) ...

dimensions
  xmain = <nx>
  ymain = <ny>

variables
  float xmain(xmain)
    xmain:standard_name = 'grid_longitude'
    xmain:units = 'degrees'
 float  ymain(ymain)
    ymain:standard_name = 'grid_latitude'
    ymain:units = 'degrees'
 float xtrue(ymain, xmain):
    xtrue:standard_name = "longitude"
    xtrue:units = "degrees_east"
        # NB the distinction between 'degrees' and 'degrees_east' is canonical, but I don't think the unit is necessarily intended to be definitive :  e.g. we can reasonably use "radians" instead, and I don't think "radians_east" is a thing
 float ytrue(ymain, xmain):
    ytrue:standard_name = "latitude"
    ytrue:units = "degrees_north"

  int gridvar
    gridvar:grid_mapping_name = "rotated_latitude_longitude" ;
    gridvar:grid_north_pole_latitude = 32.5 ;
    gridvar:grid_north_pole_longitude = 170. ;

  float mydata(ymain, xmain):
     mydata:grid_mapping = "gridvar" 
     mydata:coordinates = "xtrue ytrue" 

However... at the MetOffice are now seeing some files with structured X*Y rotated-pole data, but coded as a 1D mesh. So logically this can look just like the above, except that there is only 1 dimension (the mesh). That can look roughly like this ...

dimensions
  mesh = <n>

variables
  float xmain(mesh)
    xmain:standard_name = 'grid_longitude'
    xmain:units = 'degrees'
 float  ymain(mesh)
    ymain:standard_name = 'grid_latitude'
    ymain:units = 'degrees'
 float xtrue(mesh):
    xtrue:standard_name = "longitude"
    xtrue:units = "degrees_east"
        # NB the distinction between 'degrees' and 'degrees_east' is canonical, but I don't think the unit is necessarily intended to be definitive :  e.g. we can reasonably use "radians" instead, and I don't think "radians_east" is a thing
 float ytrue(mesh):
    ytrue:standard_name = "latitude"
    ytrue:units = "degrees_north"

  int gridvar
    gridvar:grid_mapping_name = "rotated_latitude_longitude" ;
    gridvar:grid_north_pole_latitude = 32.5 ;
    gridvar:grid_north_pole_longitude = 170. ;

  float mydata(mesh):
     mydata:grid_mapping = "gridvar" 
     mydata:coordinates = "xtrue ytrue" 

But there is a real problem with this because, I think, the way CF is written does not envisage that you might have trouble telling apart the "main" and "aux" coords, and it does not envisage that a variables "main" coords might be true-lat-lon, and the "aux" ones in a different coordinate system. In fact, CF explicitly describes the grid-mapping [as an aspect of the data](), and the coords only have that context by reference.

So in this case, we could add a grid-mapping on the data variable, and Iris could work out that this applies to the aux-coords with standard_names "grid_xx", and not to the "lat(long)itude" ones ( -- though from experiment, that definitely does not work at present. ) But that feels like we are stretching the CF well beyond it's intended design scope.

Whereas, if this data were to use an extended grid-mapping, that would be a good unambiguous way of encoding what it is.

ESadek-MO commented 1 year ago

Waiting to hear that somebody would actually want this, so closing. Can be reopened if desired.

trexfeathers commented 2 months ago

From https://github.com/SciTools/iris/pull/4719#issuecomment-2079486282:

Encoding axis ordering in grid_mapping has a good chance of breaking downstream code. E.g. Iris would currently error (?) if it loaded a CRS with grid_mapping in this format. So it is important that this is a opt-in switch.

We propose a new @property on the CoordSystem class: CoordSystem.encode_axis_order. This defaults to False, preserving the existing behaviour. When switched to True, grid_mapping will be written using the new format e.g. data:grid_mapping = "crs: latitude, longitude".

We could potentially detect situations where this is NEEDED to make sense of the crs_wkt attribute, and issue a warning if CoordSystem.encode_axis_order is False.

Iris also needs to be able to understand this on loading.

Name of property is definitely up for debate

pp-mo commented 2 weeks ago

Another possible thing to consider in context of this is the rotate_winds function. In that case the result cubes have data points whose locations are on a grid in one coordinate system, but whose values are vector components based in a different coordinate system. Possibly the extended grid-mapping approach would allow us to represent this type of data in a saner fashion ?