covjson / specification

CoverageJSON specification
https://covjson.org/spec/
45 stars 6 forks source link

Curvilinear grids / coordinate parameters #32

Closed letmaik closed 2 years ago

letmaik commented 8 years ago

Continuing from https://github.com/Reading-eScience-Centre/coveragejson/issues/24#issuecomment-151551355

jonblower commented 8 years ago

Just to note that we may have the problem of specifying CRSs for single axes in "normal" domains too. One use case is zonal averages (averages across all values of longitude), which gives a parameter (e.g. temperature) that is a function of latitude only. Essentially this is a section or transect, but there is no longitude in the domain (unless you gave a single longitude coordinate, with bounds spanning the world and a cell_methods indicating that the value is an average over these bounds).

letmaik commented 8 years ago

First thoughts:

First idea, assuming a domain of X and Y in arbitrary index coordinates and the following coordinate parameter (longitude would be similar and be included as well):

"parameters" : {
  "lat": {
    "type": "Parameter",
    "description": {
      "en": "Latitude in degrees in the WGS84 datum"
    },
    "observedProperty" : {
      "id" : "http://vocab.nerc.ac.uk/standard_name/latitude/",
      "label" : {
        "en": "Latitude"
      }
    },
    "unit": {
      "id": "http://www.opengis.net/def/uom/OGC/1.0/degree",
      "symbol": "degree"
    },
    "referencing": {
      "crsAxisIndex": 0,
      "srs": {
        "type": "GeodeticCRS",
        "id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"        
      }
    }
  }
}

So the main points are:

The only case I can imagine where this becomes annoying is when you want to include the datum definition and then would have to do it twice if you include multiple of such coordinate parameters for the same CRS. The same applies for the axis definition, however we could possibly change the required sorted array of axes within a CS to be just an unordered array (set) where each axis has "order": 1 like in WKT, then it would be possible to omit axes that are not relevant for referencing.

jonblower commented 8 years ago

I think that this formulation (in the JSON above) looks OK. I don't think that it's too strange, in principle, to have a coordinate value as an observedProperty. (A GPS device "measures" lon and lat. You can imagine a coverage that measures lon and lat as a function of time or distance.)

letmaik commented 8 years ago

I think that this formulation (in the JSON above) looks OK. I don't think that it's too strange, in principle, to have a coordinate value as an observedProperty. (A GPS device "measures" lon and lat. You can imagine a coverage that measures lon and lat as a function of time or distance.)

Exactly, it "measures". With curvilinear grids you just want to provide coordinates in a more useful system. I think it would be very important to differentiate whether something shall count as the actual measured result or as just some additional derived information. In netCDF-CF this is clear since such coordinate variables are associated as auxiliary variables with "coordinates='lat lon'".

jonblower commented 8 years ago

TBH I'm personally still happy with use of observedProperty. What about adding a property "type": "auxiliaryCoordinate" or something?

I think if we get too deep into what constitutes a "measurement" and what is "derived information" we may get into a tangle. There will be cases, particularly for model data, where the distinction isn't too clear.

letmaik commented 8 years ago

Derived was the wrong word, but even for model data you want to know what the model output is and what is just used for referencing the output. Your lat/lon 2D grids should not be discoverable as model output, except if that is your actual model output. Right? I'm not yet sure what the best way is to handle that. Instead of "type" : "Parameter" there could be "type" : "AuxiliaryParameter". But I guess then there should be something like a "ResultParameter" as well such that "Parameter" becomes abstract.

Am 16.11.2015 um 18:34 schrieb Jon Blower:

TBH I'm personally still happy with use of observedProperty. What about adding a property |"type": "auxiliaryCoordinate"| or something?

I think if we get too deep into what constitutes a "measurement" and what is "derived information" we may get into a tangle. There will be cases, particularly for model data, where the distinction isn't too clear.

— Reply to this email directly or view it on GitHub https://github.com/Reading-eScience-Centre/coveragejson/issues/32#issuecomment-157129709.

jonblower commented 8 years ago

Yes, you want to distinguish coordinate information from "variables", I see what you mean now. Not sure what the best way to do this is. My feeling is to make everything a "Parameter" but add a property to coordinate parameters. (But if you wanted to use a different type, how about "CoordinateParameter"?)

letmaik commented 8 years ago

"CoordinateParameter" sounds like it also applies to measured coordinates. A property would be possible, yes, like "auxiliary": true or something like that.

jonblower commented 8 years ago

Yes. It might be interesting to see the client code that results from this structure. Can a client discover the auxiliary coordinate information and associate it with the "real" coordinates without too much effort? As you say, the CF mechanism may (or may not) be easier from this point of view.

letmaik commented 8 years ago

Oops, I just realized that using the parameters for that doesn't work at all! If you have a grid which has X,Y, but also T, where X,Y is the index grid that should have lat/lon associated elsewhere, and T are regular time references, then a lat or lon parameter would of course run over the whole domain, including T, which is not the point of it. So... back to square one unfortunately.

letmaik commented 8 years ago

Ok, got an idea. A curvilinear grid is nothing else than a projected CRS where the projection does not follow a nice formula but instead assigns each x,y pair directly a (lon,lat) pair. That means, the function is a massive case statement / lookup table, but why not. Geotools defines in the ProjectedCRS interface amongst others the methods getBaseCRS() and getConversionFromBase(). And the latter method returns a Projection which maps lat,lon (base CRS) to x,y (but also has an inverse() operation). So, the point is, we can just reference a curvilinear grid as a ProjectedCRS which explicitly defines a mapping from x,y to its base CRS's lon,lat for a given resolution and bounding box (which would correspond to the x,y values of the domain axes). This extends naturally to a projected CRS where you know the projection but still want to define a direct mapping to the base CRS. Practically, this would mean that the domain object would include the 2D arrays (probably encoded as 1D for efficiency), and not the parameters/range. If you actually measure coordinates, then you can still use the parameters for that. And that means we keep a clean separation between results and domain coordinates and don't need something like "auxiliary":true. Solved?

jonblower commented 8 years ago

Yes, I can see how this might work. But it’s worth bearing in mind that CF allows for auxiliary coordinates to be specified in addition to a proper projected CRS. This allows clients that don’t understand the projected CRS to use the aux coordinates directly.

Therefore maybe the mapping to lat-lon could be specified in addition to the projected CRS. So the “referencing” object could contain the “crs” object, plus an optional “coordinate mapping” object. The “coordinate mapping object” would itself need to contain a CRS, to show how the lat-lon coords should be interpreted (they won’t always be WGS84).

And the i-j coordinates could be associated with an ImageCRS (or EngineeringCRS) to emphasise that they are non-geospatial.

letmaik commented 8 years ago

I think it can still go under the crs object and the client could easily understand it. What I have in mind is something like:

{
  "type": "ProjectedCRS",
  "cs": {
    "type": "ImageCS",
    "dimension": 2
  },
  "baseCRS": {
    "type": "GeodeticCRS",
    "id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
  },
  "conversionToBaseCRS": {
    "type": "DirectConversion",
    "sourceAxes": [{
      "start": 0,
      "stop": 100,
      "step": 1
    }, {
      "start": 0,
      "stop": 50,
      "step": 1
    }],
    "targetCoordinates": [{
      "targetAxis": {
        "axisIndex": 0,
        "abbrev": "lat"
      }
      "values": [50.3,50.2,50.3,...] 
    }, {
      "targetAxis": {
        "axisIndex": 1,
        "abbrev": "lon"
      }
      "values": [-10.1,-10.1,-10.2,...] 
    }]
  } 
}
letmaik commented 8 years ago

Or actually, the correct cs type would be CartesianCS. EDIT: And DirectConversion could be better named DiscreteConversion.

letmaik commented 8 years ago

I thought about how to supply bounds coordinates in the base CRS as well, and I think this fits in. Basically, if you had a nice formula for the projection then you could directly convert the bounds, but here we only give discrete conversions, so why not give another discrete conversion which matches the bounds coordinates:

{
  "type": "ProjectedCRS",
  "cs": {
    "type": "CartesianCS",
    "dimension": 2
  },
  "baseCRS": {
    "type": "GeodeticCRS",
    "id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84"
  },
  "conversionToBaseCRS": [{
    "type": "DiscreteConversion",
    "sourceAxes": [{
      "start": 0, "stop": 100, "step": 1
    }, {
      "start": 0, "stop": 50, "step": 1
    }],
    "targetCoordinates": [{
      "values": [50.3,50.2,50.3,...] 
    }, {
      "values": [-10.1,-10.1,-10.2,...] 
    }]
  }, {
    "type": "DiscreteConversion",
    "sourceAxes": [{
      "start": -0.5, "stop": 100.5, "step": 1
    }, {
      "start": -0.5, "stop": 50.5, "step": 1
    }],
    "targetCoordinates": [{
      "values": [50.1,50.2,50.2,...] 
    }, {
      "values": [-10.0,-10.1,-10.2,...] 
    }]
  }]
}

I think it wouldn't be that hard to write a helper like asCRS84(x,y) that returns a {lat,lon} object for a given x and y coordinate, and then you can use that to look up axis as well as bounds coordinates.

With that structure above, the "conversionToBaseCRS" being an array, you could imagine that this could theoretically also include a function-like conversion, and clients that don't understand it could use the discrete one if available.

Some description text for the example above:

Note that there are two conversions given above. This is useful when the domain includes bounds as well. One of the conversions would be for the axis coordinates, and the other for the bounds coordinates. The "values" in "targetCoordinates" are multi-dimensional arrays encoded as a flat array similar to the range values of a coverage. The dimension is equal to that of the projected CRS. The order of the objects in "targetCoordinates" corresponds to the order of the axes in the base/target CRS, which in the example above would be longitude, then latitude.

jonblower commented 8 years ago

OK, I think this will do as an example for now. We don't have a clear use case in MELODIES for curvilinear grids, so we can defer a more detailed treatment for later.

One thing I'm not sure about in the above formulation is, what's the purpose of sourceAxes? Won't the domain itself contain definitions of these? I would have thought that the crs object would just have the "base" CRS and the list of i*j target coordinates for each axis in this CRS.

letmaik commented 8 years ago

Well, the reference system objects are defined independent of the domain. Only the "identifiers" field in the parent object creates the connection. Apart from that, if not using sourceAxes you would need some way to distinguish between axis values and their bounds.

jonblower commented 8 years ago

OK, I think perhaps I haven't understood this. Perhaps you can explain it to me when we meet!

letmaik commented 8 years ago

Should we leave curvilinear grids for a future version of CoverageJSON? Seems too complex to get right at this point and we don't have any direct use cases outselves. I don't see any backwards-compatibility issues since we would just add more information later. If yes, then I will remove the part from the spec again

jonblower commented 8 years ago

Yes, I think that if we can leave a space for curvilinear grids we should leave them. My only concern would be if we found we needed to make major structural changes to accommodate them. But perhaps this is not likely.

jonblower commented 8 years ago

OK, I had another thought about this. I suggest the following structure for the domain (the parameters and range are unaffected). This is a case of a simple xy grid, but other axes could be added easily. Note that I've introduced the concept of an "index axis", which simply has a size, no axis values.

{
  "domainType": "Grid",
  "axes": {
    "i": { "num": 200 }, // index-based axis, no need for values
    "j": { "num": 150 }  // from absence of "start" and "stop" clients know
                         // these are indices, not coord values
  },
  "referencing" : [{
    "components": [ "j", "i" ], // Controls the axis order in the `xValues` and `yValues` array
    "system": {
      "type": "ReferencingByCoordinates", // or some better name
      "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84",
      "xValues": [ -172, -171, -170, ...], // Arrays of coordinate values in the
      "yValues": [ 52, 53.5, 54, ...],     // above CRS (each size 200x150),
                                           // representing grid cell centres
      "bounds": [ /* optional array of 200x150 polygons representing bounds of grid cells */ ]
    }
  }]
}

Is anything else needed? There is some specification needed to link "xValues" and "yValues" accurately to the axes of the coordinate reference system.

The range then looks like this:

{
  "type" : "NdArray",
  "dataType": "float",
  "axisNames": ["j", "i"],
  "shape": [150, 200],
  "values" : [ ... ]
}
jonblower commented 8 years ago

Maybe the referencing.system could include:

jonblower commented 8 years ago

bounds could be an (n+1) x (m+1) array of vertices, which would be more efficient than encoding the polygons.

letmaik commented 8 years ago

The reason why in netCDF-CF the bounds are polygons is that it allows non-contiguous intervals in general (e.g. for simple axes like time or depth). So I guess using a (n+1) x (m+1) array would be fine since we specifically target curvilinear grids here.

I suggest we let it sit for a while and see if it still makes sense. If we use your approach then we'd have to change the spec though, since it requires start/stop for the regular axis case.

How does your approach work when you want to supply such coordinates for a known projection? It seems incompatible for that, which is where my approach was targeted at. Is that a use case at all? Or would we just give the CRS URI for e.g. British national grid and anyone who doesn't understand it has to look for a lat/lon reprojected variant or just abort? I wouldn't mind to be honest, as this just means making clients a bit smarter themselves (e.g. by doing reprojection themselves; or using a lat/lon variant if available) and reducing transferred data volume.

jonblower commented 8 years ago

I guess you could define several CRSs for the same set of components. In this way you could provide curvilinear coordinates pre-calculated in multiple CRSs. And you could provide lat-lon coordinates for existing projected CRSs, to save clients calculating them. Let's see how this might work:

{
  "domainType": "Grid",
  "axes": {
    "i": { "num": 200 }, // The use of index-based axes implicitly 
    "j": { "num": 150 }  // creates an engineering CRS, but we don't need
                         // to name it
  },
  "referencing" : [{
    "components": [ "j", "i" ],
    "system": {
      "type": "ReferencingByCoordinates", // or some better name
      "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84",
      ... // xValues, yValues etc
    }
  },{
    "components": [ "j", "i" ],
    "system": {
      "type": "ReferencingByCoordinates", // or some better name
      "crs": "http://www.opengis.net/def/crs/whatever/BNG/is",
      ... // xValues, yValues etc
    }
  }]
}

Then the client could pick the curvilinear coordinates it understands. Let's look at the case of a projected CRS:

{
  "domainType": "Grid",
  "axes": {
    "x": { "start": 0, "stop": 156.2, "num": 200 },
    "y": { "start": 50, "stop": 213. "num": 150 }
  },
  "referencing" : [{
    "components": [ "y", "x" ],
    "system": {
      "type": "ProjectedCRS", // the native grid CRS
      "crs": "http://www.opengis.net/def/crs/whatever"
    }
  },{
    "components": [ "y", "x" ],
    "system": {
      "type": "ReferencingByCoordinates",
      "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84",
      // Here we ignore the axis values in the ProjectedCRS and just treat
      // it as a curvilinear grid
      ... // xValues, yValues etc
    }
  }]
}
letmaik commented 8 years ago

Interesting case of providing multiple CRSs for a curvilinear grid. I think this makes sense for projected CRS as well, instead of having just a single fixed base CRS by definition as it is defined in ISO19111. But somehow I have the feeling that it would complicate clients too much if they would have to anticipate that components can be referenced multiple times, in terms of the referencing objects.

Another thing to note would be that ReferencingByCoordinates systems would really be very specific to CovJSON and don't make sense as a reusable thing. One reason is that they directly work on cells and not coordinates, knowing the concept of bounds for example.

By the way, for ProjectedCRS the CRS URI would still go under "id", while for ReferencingByCoordinates this would be something like "targetCRS" and the "id" would not be defined since this thing is not globally identified as such and not reusable.

jonblower commented 8 years ago

But somehow I have the feeling that it would complicate clients too much if they would have to anticipate that components can be referenced multiple times, in terms of the referencing objects.

I think this could be an optional thing. The first CRS with the given components would always be the "most authoritative" one - the rest would be redundantly-calculated ones for helping clients. Or we could just disallow this altogether - but CF allows both a projected CRS and a curvilinear CRS for the same field, to make things simpler for clients that don't understand the projection.

Another thing to note would be that ReferencingByCoordinates systems would really be very specific to CovJSON and don't make sense as a reusable thing.

I think it's more that the CRS only works for that particular coverage so can't be reused for other coverages. But yes, it doesn't need an ID, unless we want to allow it to live in a separate document (I don't think we do?)

letmaik commented 8 years ago

Should ReferencingByCoordinates also work for composite axes like in a trajectory? I guess so, right? How would that look like?

Also, I think we should think about actual use cases in the context of the web. Some questions I'd ask:

This also depends on what the server can do (automatic reprojection etc.) or how many projections are pre-computed and made available to clients. A very simple server which only reads e.g. netCDF-CF files and exposes them may be forced to include the 2D curvilinear coordinates because the server doesn't have data available in other projections and cannot reproject on the fly.

Also, subsetting curvilinear grids in target CRS coordinate space will be quite hard, so it would be hard to implement that in a server.

It would be good to have some very specific use cases in the web context before we start thinking about how to implement it.

jonblower commented 8 years ago

Should ReferencingByCoordinates also work for composite axes like in a trajectory?

Maybe, but I don't know why you would do this, rather than just specifying the coordinates directly in the trajectory?

In which case would a client want to have the 2D curvilinear grid coordinates instead of a resampled grid in another projection or lon/lat?

A resampled (rasterized) grid would lose the shape of the grid cells, which might be important when zoomed in a long way. Sometimes there is a large difference in size between the smallest and largest grid cells, meaning that you'd have to pick a very small grid cell size to accurately record the grid. Also, sometimes the rasterized grid would have large empty regions.

Including 2D coordinates for a curvilinear grid precautionally is only a solution for a small number of advanced clients I think, since it is way harder to process, render etc.

I dunno - for many clients I think it would be easier to read WGS84 lat-lon coordinates than figure out how to do a projection. Also, sometimes curvilinear coordinates is all you have - there isn't always a native projection at all (e.g. ocean model).

How far do curvilinear grids overlap with the MultiPolygon domain type?

Good question. MultiPolygon might do the job, except that it does not record the grid cell centres, which might be important. (And these are not necessarily the centroids of the grid cells, I don't think.)

It would be good to have some very specific use cases in the web context before we start thinking about how to implement it.

A use case would be a coastal-ocean model that follows the coastline, estuaries etc. A bit like the "unstructured" grids, but with a quadrilateral geometry instead of triangular. Maybe a single mechanism could be used for polygons of any shape (like in VTK).

letmaik commented 8 years ago

Should ReferencingByCoordinates also work for composite axes like in a trajectory?

Maybe, but I don't know why you would do this, rather than just specifying the coordinates directly in the trajectory?

So, let's say a VerticalProfile in the UK is given in British National Grid coordinates inside the x and y domain axes. Now the question is, where would you put the latitude/longitude coordinates? Is that something that we should support?

Including 2D coordinates for a curvilinear grid precautionally is only a solution for a small number of advanced clients I think, since it is way harder to process, render etc.

I dunno - for many clients I think it would be easier to read WGS84 lat-lon coordinates than figure out how to do a projection. Also, sometimes curvilinear coordinates is all you have - there isn't always a native projection at all (e.g. ocean model).

Sorry, I misphrased that. If you want to output a curvilinear grid as it is, then sure, you need these 2D coordinates. If it's a common projection I wouldn't include the 2D coordinates though, since this is not what a client may need anyway. The client may not have the ability to render those arbitrary polygons but rather it may want to reproject the grid to lat/lon and then use it's normal and simple drawing routines. Alternatively, it may handle the projected coordinates directly if there's a matching base map or whatever. But transferring and processing 2D coordinate arrays is really too much and too hard in my opinion, at least in most cases where it's not strictly necessary.

A use case would be a coastal-ocean model that follows the coastline, estuaries etc. A bit like the "unstructured" grids, but with a quadrilateral geometry instead of triangular. Maybe a single mechanism could be used for polygons of any shape (like in VTK).

I assume resampling for those grids is not a common thing, right? So it would really be "load/display the original resolution or nothing at all", right? I just think that in the context of easy web processing or display, this is a fairly limited use case. Wouldn't you at least also want a reprojected lat/lon (or other common projection) grid?

jonblower commented 8 years ago

Right, I see. I guess one overriding question is - if the data are in a well-known projection, do we also provide coordinates (redundantly) in other projections? In CF (for gridded data at least) the answer was "yes" because CF has historically been oriented towards the global climate modelling community, where GIS projections aren't too common and lat-lon is all that some tools understand. Perhaps in CoverageJSON we're oriented a bit more towards the GIS community, who we can assume has more tools to deal with projections (but maybe not in the web browser?)

But transferring and processing 2D coordinate arrays is really too much and too hard in my opinion, at least in most cases where it's not strictly necessary.

I think, at the most, it would be optional to provide redundant coordinates in other projections. It should not be compulsory for the data provider to do so. I can imagine that web clients will be heavily oriented towards CRS84 so there may be some value in providing redundant lat-lon coords.

In the case of non-gridded data - hm, this could be awkward. It might not be too bad for primitive axes, but for tuple/polygon axes it could be much harder. Maybe there could be multiple alternative domains, specified in different CRSs...? Sounds like that's getting too difficult.

I assume resampling for those grids is not a common thing, right? So it would really be "load/display the original resolution or nothing at all", right?

It's tough to resample in curvilinear space, yes. You could resample in index space (like an OPeNDAP client would), but this might not be optimal given that the grid cells are different shapes and sizes. So if you wanted to do tiling/resampling you would probably reproject to a regular grid and accept that this has its own limitations (e.g. loss of grid cell shape).

letmaik commented 2 years ago

Closing this as out-of-scope since curvilinear grids are quite rare and the expectation is that CovJSON is used for web applications where it is extremely likely that standard projections are used that can be handled by proj4 and similar libraries. If there is a need for this in the future, which I doubt, then there should be enough extension places (probably within the CRS object) to put such data.