Should domain contain non-varying coordinates? / domain restructuring

letmaik commented 9 years ago

There is a debate whether the coverage domain should contain the fixed coordinates (like x/y/t for a Profile) or whether these should be moved outside the domain.

Some notes:

CSML separates both and uses "location" and "time" within the feature for the fixed coordinates. However, for coverages which don't vary, like PointCoverage, they include the fixed coordinates redundantly in the domain, making it a bit inconsistent.
In CSML, the location and time is optional for Profile. For Point, the location is required within the domain and the "location".
In CSML, for a Section, the domain only contains the Profile depths, and the locations and times are outside the domain and optional (and called "stationLocations" and "stationTimes").
In CSML, for a Trajectory, the domain is a 1D grid over a compound CRS, and the CRS then defines for each axis point the location and time. Interesting, but seems impractical to use.
In CoverageJSON, for MultiPolygon, the fixed z-coordinate is either given or omitted. Having just the z-coordinate within a "location" seems strange. The type wouldn't be a "Point", but something like "VerticalCoordinate".
When the exact location of a Profile or PointSeries is unknown and just a Station ID (or similar), should this be included in the domain or somewhere else? If such IDs become part of the domain, then the obvious question is: where does it stop? What is domain and what isn't?

jonblower commented 9 years ago

I think this issue is connected with the question of how we record metadata about the instrument or site that makes the measurement. How about this:

Every Coverage/observation type has an "instrument" property (or maybe with another name from O&M) that we use as a hook for this. This includes things like the station ID.
If the instrument is in a fixed location (e.g. for a profile or timeseries) then the location is recorded within this property. This information is also recorded in the domain. I think this redundancy is OK because it's useful to have all the information about the instrument encapsulated in a single object.

jonblower commented 9 years ago

I can see that the variety of coverage types means that it could look inconsistent if we move bits of the domain out of the domain object. And for some coverage types (trajectories, grids) it might not make sense at all. But we need to somehow allow for the fact that the "fixed parts" of the domain (e.g. xy location, t location, z location) could be expressed in many ways):

As precise coordinates with infinitesimal extent (points)
As polygons
Maybe line strings? not sure
As URIs pointing to the definition of a location that is perhaps complex to encode inline
unknown

We want to avoid an explosion of coverage types - we don't really want 4-5 profile coverage types. Can we just give them all the same type and allow the client to inspect the types of the fixed parts of the domain?

letmaik commented 9 years ago

About the "instrument" property, I'm fine with that, however I would clearly state that this is purely metadata which generic visualization clients don't need. Also, I think the CRS of the instrument location should be allowed to be in a different CRS than the domain, e.g. domain uses UTM, but instrument CRS84. And since the term "instrument" will not fit for everything, it should probably not be required to be exactly that. Of course, then the question is, what is required at all?

About fixed domain parts, I agree with what you wrote, and I add to the list:

a combination of the above (e.g. precise coordinates & URI). I think letting the client inspect the inner types is fine, as long as we define some common default ones, otherwise it will become a mess. It will complicate implementations a bit though, because now the geometry type used for rendering is not determined by the domain type alone in some cases. But I think this is not a big issue.

An instrument URI is never the same as a domain location URI, right? The latter would be more like URIs for lakes, which is the O&M "feature of interest" I guess. And the actual coordinates in the domain are the "sampling feature".

letmaik commented 9 years ago

Let's see how that could look like... Profiles:

precise coordinates:

{
  "type": "Profile",
  "z": [1,5,20],
  "geometry": { // any GeoJSON geometry
    "type": "Point",
    "coordinates": [1, 21] // only x,y allowed
  },
  "temporal": {
    "type": "Instant",
    "dateTime": "2008-01-01T04:00:00Z"
  }
}

URI as location:

{
  "type": "Profile",
  "z": [1,5,20],
  "geometry": "http://.../area/2", // does it make sense to call it geometry here?
  "temporal": {
    "type": "Instant",
    "dateTime": "2008-01-01T04:00:00Z"
  }
}

PointSeries: (makes more sense when it's called TimeSeries, or.. GeometrySeries,...)

{
  "type": "TimeSeries",
  "t": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z"],
  "geometry": { // any GeoJSON geometry
    "type": "Point",
    "coordinates": [1, 20, 1] // only x,y[,z] allowed
  }
}

What if only z is known and x,y should be a URI? Then geometry (esp GeoJSON) is tricky to use. Maybe we need to separate horizontal and vertical in that case and create a new geometry type for vertical coordinates only:

{
  "type": "TimeSeries",
  "t": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z"],
  "horizontalGeometry": "http://.../area/2",
  "verticalGeometry": {
    "type": "VerticalCoordinate",
    "coordinate": 5
  }
}

Although I have no idea what "verticalGeometry" could ever be except a vertical coordinate.

We also have to consider the space usage, e.g. for big collections of (subsetted) Profiles.

letmaik commented 9 years ago

Then there is MultiPolygonSeries, which could be generalised to MultiGeometrySeries?

{
  "type": "MultiGeometrySeries",
  "t": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z"],
  "horizontalGeometry": { // any Multi* GeoJSON geometry
    "type": "MultiPolygon",
    "coordinates": [ // only x,y allowed
      [ [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ] ],
      [ [ [200.0, 10.0], [201.0, 10.0], [201.0, 11.0], [200.0, 11.0], [200.0, 10.0] ] ]
    ]
  },
  "verticalGeometry": {
    "type": "VerticalCoordinate",
    "coordinate": 5
  }
}

Here the geometry is part of the varying domain part, which may be confusing.

For a single Point/Polygon:

{
  "type": "Geometry",
  "horizontalGeometry": { // x,y only
    "type": "Polygon",
    "coordinates": [
      [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]
      ]
   },
  "verticalGeometry": {
    "type": "VerticalCoordinate",
    "coordinate": 2
  },
  "temporal": {
    "type": "Instant",
    "dateTime": "2008-01-01T04:00:00Z"
  }
}

Should that be called Geometry coverage? Sounds ambiguous, esp because a grid is also a geometry in common terminology.

Possible domain types (current names in brackets):

Single (Point, Polygon)
Multi (MultiPoint, MultiPolygon)
SingleSeries (PointSeries, PolygonSeries)
MultiSeries (MultiPointSeries, MultiPolygonSeries)
Grid
Trajectory
Section

letmaik commented 9 years ago

I find it a bit inconsistent to keep the varying parts in the domain root like that. If we really use these more complex objects, then the varying parts should probably be part of that too, so for example:

{
  "type": "Profile",
  "verticalGeometry": {
    "type": "VerticalCoordinates",
    "coordinates": [1,5,20,25,30] // not GeoJSON
  },
  "horizontalGeometry": { // any GeoJSON geometry
    "type": "Point",
    "coordinates": [1, 21] // only x,y allowed
  },
  "temporal": {
    "type": "Instant",
    "dateTime": "2008-01-01T04:00:00Z"
  }
}

And the domain type alone defines over which properties (or coordinates of them) the domain runs.

The MultiSeries could look like:

{
  "type": "MultiSeries",
  "temporal": {
    "type": "Series",
    "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
  },
  "horizontalGeometry": { // any Multi* GeoJSON geometry
    "type": "MultiPolygon",
    "coordinates": [ // only x,y allowed
      [ [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ] ],
      [ [ [200.0, 10.0], [201.0, 10.0], [201.0, 11.0], [200.0, 11.0], [200.0, 10.0] ] ]
    ]
  },
  "verticalGeometry": {
    "type": "VerticalCoordinate",
    "coordinate": 5
  }
}

I think I slowly see the value of that... even though it hurts a bit in terms of data efficiency. By having individual objects for the different domain parts it is more straight forward to assign metadata to them, like the CRS, bounds, accuracy etc.

jonblower commented 9 years ago

Yes, I think this is a promising approach. I guess this approach also supports the addition of coordinate bounds as an optional extra field in the relevant objects.

In future, I guess it may be useful to have combined horizontal and vertical geometries, although I can't think of a use case right now. I guess this could be achieved by replacing the separate h and v components with a "3dgeometry" field or something like that.

I'd be interested to see how this approach pans out for trajectories and sections. For the horizontal part, would we use a GeoJSON LineString? Or would we use separate arrays of x and y coordinates as in the current version of our spec? My instinct would be for the former to be consistent.

Grids would presumably not have a coordinates property but an axes property or something like that? How about curvilinear grids?

Anyway, I think it's worth now applying this idea to all the domain types we have on our list to see how it pans out.

letmaik commented 9 years ago

Trajectory:

{
  "type": "Trajectory",
  "temporal": {
    "type": "Series",
    "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
  },
  "horizontalGeometry": {
    "type": "LineString",
    "coordinates": [ // only x,y allowed
      [100.0, 0.0], [101.0, 0.0], [101.0, 1.0]
    ]
  },
  "verticalGeometry": {
    "type": "VerticalCoordinates", // or VerticalCoordinate if constant
    "coordinates": [5, 2, 3]
  }
}

I think keeping horizontalGeometry as GeoJSON (except for Grid) is a good idea. And by naming it horizontalGeometry it is clear that we specifically choose to use GeoJSON for xy, and use something more flexible for vertical.

Section:

{
  "type": "Section",
  "temporal": {
    "type": "Series",
    "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
  },
  "horizontalGeometry": {
    "type": "LineString",
    "coordinates": [ // only x,y allowed
      [100.0, 0.0], [101.0, 0.0], [101.0, 1.0]
    ]
  },
  "verticalGeometry": {
    "type": "VerticalCoordinates",
    "coordinates": [5, 2, 3, 4, 7, 3]
  }
}

So, it looks like a Trajectory, and the domain type has to be used to differentiate between both. Is that good? Should the relations between the domain-parts be described somehow as in the current spec ("sequence": ["x","y","z","t"]) so that the structure is self-describing without looking at the domain type?

Grid:

{
  "type": "Grid",
  "temporal": {
    "type": "Series",
    "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
  },
  "horizontalGeometry": {
    "type": "Grid",
    "axes": [
      [1,2,3], // x
      [20,21] // y
    ]
  },
  "verticalGeometry": {
    "type": "VerticalCoordinates",
    "coordinates": [5, 2, 3, 4, 7, 3]
  }
}

I think the Grid geometry type is in line with GeoJSON. It's comparable to the special geometry "GeometryCollection" which uses "geometries" instead of "coordinates". And here, the "axes" arrays are in the same order as the root Point geometry, so I think this fits quite well to the GeoJSON model, and extends naturally to the coordinates extensions which are allowed in GeoJSON (third element for z, rest for arbitrary, even though we don't use those here).

letmaik commented 9 years ago

About curvilinear grids, are these rectilinear grids in a different-than-default CRS for which you want to give coordinates in the default CRS (CRS84) as well? Or is it really the same CRS just with funny grid geometry? In the latter case (which is probably correct), this would be a new geometry type:

  "horizontalGeometry": {
    "type": "CurvilinearGrid",
    "axes": [
      [[1,2,3],[1,1.5,2]], // 2D x
      [[20,21,20],[21,22,23]] // 2D y
    ]
  },

jonblower commented 9 years ago

are these rectilinear grids in a different-than-default CRS for which you want to give coordinates in the default CRS (CRS84) as well? Or is it really the same CRS just with funny grid geometry?

Either. In the first case, the curvilinear grid is just a convenience for clients that don't understand the "native" CRS. In the second case, you don't have a "native" CRS (perhaps because the grid is distorted) so the only think you can do is to provide coordinate pairs explicitly.

I'm not sure we have any curvilinear grids in MELODIES, but they are worth thinking about.

letmaik commented 9 years ago

Ok, so for the first case, this is a bit more general problem, since this also applies to temporal and vertical coordinates in some non-default CRS. Hm.... one solution that comes to my mind is simply using an array then:

  "horizontalGeometry": [ {
    "type": "Grid",
    "crs": "http://my/funny/crs",
    "axes": [
      [1,2,3], // x
      [20,21] // y
    ]
   }, {
    "type": "CurvilinearGrid", // default CRS
    "axes": [
      [[1,2,3],[1,1.5,2]], // 2D x
      [[20,21,20],[21,22,23]] // 2D y
    ]
  }],

Where each represents exactly the same geometry/temporal coordinates, just in a different CRS. I think this is still simple enough, as a simple CRS84-only client could just look for any object without CRS and take that.

letmaik commented 9 years ago

By the way, this means that the Domain type cannot be any more specific than the abstract "Grid" which is probably ok.

jonblower commented 9 years ago

In the case of specifying a curvlinear grid as a convenience to deal with a "funny" CRS, I wonder if it might be better to specify two separate properties (e.g. "horizontalGeometry", "alternativeHorizontalGeometry") rather than having the hG as an array?

letmaik commented 9 years ago

With alternative* you would have more semantics, meaning that hG is the main/standard/preferred/native grid, and the other is an additional one. Is that desired? Should the native hG influence the default visualization/projection chosen? What if you have a third one? Unlikely, but why not?

On the other hand, to stay consistent, being an array, the property would be "horizontalGeometries" which may be confusing as it sounds like the horizontal part is made up of multiple geometries, and not just multiple representations of the same. So, I think I agree that "alternativeHorizontalGeometry" may be better here, covering 99.99% of cases.

Do we need a similar thing for vertical and temporal? E.g. for using different time serializations (ISO string vs numeric). But maybe that's not necessary, not sure.

letmaik commented 9 years ago

I think we should at least having had considered using explicit axes, such that the domain type technically derives from the axes alone but is provided for convenience, and the axis order then clearly relates to the range value ordering. This could also align it a bit better to OGC's new CIS. Let's see how complicated it gets...

Section:

{
  "type": "Section", // subtype of Grid
  "axes": [
    {
      "type": "CompositeAxis", // all subaxes have same length and are linked
      "axes": [ {
          "type": "Series", // subtype of CoordinateAxis
          "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
        }, {
          "type": "LineString", // conceptually a CompositeAxis, but different syntax
          "coordinates": [ // only x,y allowed
             [100.0, 0.0], [101.0, 0.0], [101.0, 1.0]
          ]
        } ]
    },
    {
      "type": "VerticalCoordinates", // subtype of CoordinateAxis
      "coordinates": [5, 2, 3]
    }
  ]
}

Trajectory:

{
  "type": "Trajectory", // subtype of CompositeAxis
  "axes": [ {
          "type": "Series", // subtype of CoordinateAxis
          "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
        }, {
          "type": "LineString", // conceptually a CompositeAxis, but different syntax
          "coordinates": [ // only x,y allowed
             [100.0, 0.0], [101.0, 0.0], [101.0, 1.0]
          ]
        }, {
         "type": "VerticalCoordinates", // subtype of CoordinateAxis
         "coordinates": [5, 2, 3]
   } ]
}

Irregular grid:

{
  "type": "Grid",
  "axes": [ {
      "type": "Series", // subtype of CoordinateAxis
      "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
    }, {
      "type": "Grid", // contributes 2 axes to the outer grid
      "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84", // reason for grouping
      "axes": [ {
          "type": "CoordinateAxis",
          "coordinates": [1,2,3]
        }, {
          "type": "CoordinateAxis",
          "coordinates": [20,21]
        } ]
    }, {
      "type": "VerticalCoordinates", // subtype of CoordinateAxis
      "coordinates": [5, 2, 3, 4, 7, 3]
    }
  ]
}

Curvilinear grid:

{
  "type": "Grid",
  "axes": [ {
      "type": "Series", // subtype of CoordinateAxis
      "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
    }, {
      "type": "CurvilinearGrid",
      "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84",
      "axes": [ {
          "type": "2DGrid", // wrong name probably
          "coordinates": [[1,2,3],[1,1.5,2]]
        }, {
          "type": "2DGrid",
          "coordinates": [[20,21,20],[21,22,23]]
        } ]
    }, {
      "type": "VerticalCoordinates", // subtype of CoordinateAxis
      "coordinates": [5, 2, 3, 4, 7, 3]
    }
  ]
}

Grid with funny CRS with alternative coordinates:

{
  "type": "Grid",
  "axes": [ {
      "type": "Series", // subtype of CoordinateAxis
      "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
    }, {
      "type": "Grid", // contributes 2 axes to the outer grid
      "crs": "http://my/funny/crs",
      "axes": [ {
          "type": "CoordinateAxis",
          "coordinates": [1,2,3]
        }, {
          "type": "CoordinateAxis",
          "coordinates": [20,21]
        } ],
      "alternativeCoordinates": {
        "type": "CurvilinearGrid",
        "crs": "http://www.opengis.net/def/crs/OGC/1.3/CRS84",
        "axes": [ {
            "type": "2DGrid", // wrong name probably
            "coordinates": [[1,2,3],[1,1.5,2]]
          }, {
            "type": "2DGrid",
            "coordinates": [[20,21,20],[21,22,23]]
          } ]
      }
    }, {
      "type": "VerticalCoordinates", // subtype of CoordinateAxis
      "coordinates": [5, 2, 3, 4, 7, 3]
    }
  ]
}

I think we agree this is way too verbose, inconvenient, ugly,...

Still, is there an easier way to indicate the composite-vs-grid nature difference with Trajectory vs Section?

letmaik commented 9 years ago

How about something pragmatic like...

{
  "type": "Section",
  "sequential": {
    "temporal": {
      "type": "Series",
      "coordinates": ["2008-01-01T04:00:00Z","2008-01-01T05:00:00Z","2008-01-01T07:00:00Z"]
    },
    "horizontalGeometry": {
      "type": "LineString",
      "coordinates": [ // only x,y allowed
        [100.0, 0.0], [101.0, 0.0], [101.0, 1.0]
      ]
    }
  },
  "verticalGeometry": {
    "type": "VerticalCoordinates",
    "coordinates": [5, 2, 3, 4, 7, 3]
  }
}

And all coordinate objects (here "verticalGeometry") not inside "sequential" are assumed to be a grid axis.

jonblower commented 9 years ago

Yes, this could possibly work, but I wonder if we can be neater. How about the following:

All domains (not just grids) are combinations of orthogonal axes (the order of axes is important)
Axes are ordered lists of "atomic" (or "primitive") types
Atomic types include:
- nD positions in space and/or time
- polygons
- maybe others too
The Cartesian product of the axes gives the complete set of domain objects, in a defined order that maps to the range lists.

In the next comment I'll write down some examples.

jonblower commented 9 years ago

xy grid:

"domain": {
  "axes": [{
    "label": "longitude",
    "valueType": "x",
    "values": [0, 1, 2, 3, 4 ...]
  },{
    "label": "latitude",
    "valueType": "y",
    "values": [45, 46, 47, ...]
  }],
  // could also include z and t axes (might only have one value!)
  // other properties of the domain go here, e.g. CRSs
}

timeseries:

"domain": {
  "axes": [{
    "label": "time",
    "valueType": "t",
    "values": ["2010-01-01T00:00:00Z", "2010-01-02T00:00:00Z" ...]
  }],
  // other properties of the domain go here
}

trajectory:

"domain": {
  "axes": [{
    "label": "point along trajectory",
    // a trajectory is a 1D feature in an nD space
    "valueType": "x,y,z,t",
    "values": [ [0,1,20,"2010-01-01T00:00:00Z"], [1,2,25,"2010-01-01T00:00:05Z"], ...]
  }],
}

section:

"domain": {
  "axes": [{
    "label": "point along section",
    "valueType": "x,y,t",
    "values": [ [0,1,"2010-01-01T00:00:00Z"], [1,2,"2010-01-01T00:00:05Z"], ...]
  },{
    // A section is like a trajectory but with a separate z axis
    "label": "depth",
    "valueType": "z",
    "values": [0, 1, 2, 5, 10, ...]
  }],
}

multipolygon:

"domain": {
  "axes": [{
    "label": "polygon",
    "valueType": "polygon", // or could be "[x,y]"?
    "values": [ [], [],  [], ...] // List of polygons
  }],
  // Could also have z and t axes here
}

jonblower commented 9 years ago

What do you think? It looks fairly compact. Some properties are noteworthy:

There are no "domain types". It's very easy to define new domain types by combining axes however we wish. However, I don't know whether this makes client development hard.
The size of the domain (i.e. number of domain objects) is given by the product of the lengths of all the values arrays.
"fixed" domain values (like the location of a timeseries) would be easy to add, as single-valued axes.
If we wanted to provide a separate horizontalGeometry (e.g. for GeoJSON interoperability or plotting on a map), we would have to provide this as redundant information (a bit like a Coverage having a redundant envelope)
Maybe the axes should be a map, not an array (but the order of axes is important of course).
I'm not sure how curvilinear grids would work. I can think of a few options but probably too lengthy to go into here.

letmaik commented 9 years ago

I would still define domain types, otherwise as you said, writing libraries for that will get hard.

Pros:

Structure and relation to range is visible.
Relation to subsetting by index is clear.
Easy to explain domains, except for single-element axes like a PointCoverage (see below).
Possibly easier to write down in the spec.
Grouping of trajectory/section coordinates in a single array is easier to read.

Cons:

No GeoJSON geometries anymore.
CRS has to be defined as a composite over all axes, since there is no grouping anymore, e.g. for xy. Otherwise you need some workaround like "crsList": [{ "id": "http://...", axes: [0]}, { "id": "http://...", axes: [1,2]}] but using indices is always easy to get wrong and is a bit hard to parse for humans.
Multiple ways to represent single coordinates, e.g. PointCoverage can have four single-element axes or one single-element four-tuple axis. How to define CRSs in both cases?
With single-element axes, there is suddenly an arbitrary choice in which order to put them, and the spec would enforce a certain order (for defined domain types), however this has potential for doing things wrong since the spec has to be read in detail by data producers, whereas otherwise this task would be deferred to the library implementors (to infer/know the axis order).
For trajectory and section, typed arrays cannot be used.
Client-side search/filtering for some coordinate is more difficult/less efficient with tupled arrays (Trajectory, Section). Before, a fast binary search on the typed array could be done.
Has a slightly cryptic view for web-oriented data consumers, since there are many new concepts.
URIs as location identifier may be strange as an axis.

In summary, I like the formal part of it, but I'm not sold yet that it should be exposed directly as JSON since I mainly see that it brings more difficulties, especially with CRS (although Peter B. has to solve that for CIS as well), and that it excludes GeoJSON by forcing certain object structures. The nice encapsulation of xy gets lost for example. With your flexible axis structure we could define it as an open format with arbitrary domains only limited by the CRSs you use. But I'm pretty sure this will hinder adoption completely. Therefore we have to define domain types anyway and then having this generic structure inside the JSON is not necessary in my opinion.

What do you think about https://github.com/Reading-eScience-Centre/coveragejson/issues/24#issuecomment-140789147?

jonblower commented 9 years ago

Yes, that's a helpful analysis. For CRSs, I was thinking that there could be a separate referencing object, e.g.:

"referencing" : {
  "xy": "http://some/url/representing/CRS84",
  "z": "url representing depth below sea level",
  "t": {
    "calendar": "gregorian"
  }
}

or something like that. (Or is this hard in JSON-LD?) The keys in this object map to the symbols in the valueType list attached to the axes.

Yes, the loss of GeoJSON geometries is not a plus, but we could return to our earlier idea that GeoJSON is a separate format for a different purpose, and the coverage becomes a property of a GeoJSON feature in this view. Or we can put the GeoJSON geometry in redundantly (perhaps optionally).

Your point about inefficiency of client-side searching is interesting - an alternative serialization might be:

"domain": {
  "axes": [{
    "label": "point along trajectory",
    // a trajectory is a 1D feature in an nD space
    "valueType": "x,y,z,t",
    "values": [
      [0,1, ...], // x values
      [1,2,...], // y values
      [20,25,...] ,// z values
      ["2010-01-01T00:00:00Z", "2010-01-01T00:00:05Z", ...] // t values
    ]
  }],
}

although this is perhaps inconsistent with how polygons are encoded (which are arrays of xy tuples). What would be a use case for efficient searching of a particular y coordinate within a trajectory anyway? I can only imagine use cases for searching in a 1D ordered axis like x or y in a grid.

Your https://github.com/Reading-eScience-Centre/coveragejson/issues/24#issuecomment-140789147 could work as a structure, although personally I find my structure a bit easier to read and relate to the domain geometry.

Yes, there are multiple ways to represent certain things like single-element axes and point coverages. But I don't think this is a deal-breaker, this kind of thing happens a lot and is why conventions are important.

With your flexible axis structure we could define it as an open format with arbitrary domains only limited by the CRSs you use. But I'm pretty sure this will hinder adoption completely. Therefore we have to define domain types anyway and then having this generic structure inside the JSON is not necessary in my opinion.

I agree that we should probably name the domain types for convenience of clients, and then specify more tightly the options for these domain types (e.g. how to encode point coverages). But I think this gives us a generic structure that makes it easier to define new domain types when we need to (e.g. it's no problem to conceive of a PolygonSeriesDomain if we need it, without having to think of a new structure).

letmaik commented 8 years ago

I'm just trying to implement subsetting and noticed that it would be nice to have a domain structure like that:

"domain": {
  "axes": {
    "x": {
      "label": "longitude",
      "values": [0, 1, 2, 3, 4 ...]
    },
    "y": {
      "label": "latitude",
      "values": [45, 46, 47, ...]
    }
  },
  "axesOrder": ["x", "y"]
}

This fuses my original design (no explicit axes structure, more simple) with yours. So each axis has an identifier, which you would then use for things like subsetting: subsetByIndex({y: [0,1]}). Since "axes" has no ordering, this has to be in a separate field "axesOrder". We could have conventions for common axis identifiers like "x" etc. so that it is immediately clear which logical axis it represents without looking at something like your proposed "valueType" or a CRS. This would keep it easier for library developers as well, cov.domain.axes.x.values vs cov.domain.axes[0].values.

Also, axis identifiers make CRS (etc.) definitions easier, which could either be inlined into the axis or if it covers multiple axes (or a composite axis, see below) it is outside like:

"referencing" : [{
  "axes": ["x","y"],  
  "crs": "http://some/url/representing/CRS84"
}]

Composite axes didn't get easier, but probably also not harder:

"domain": {
  "axes": {
    "seq": {
      "label": "point along trajectory",
      "componentNames": ["x", "y", "z", "t"],
      "values": [ [0,1,20,"2010-01-01T00:00:00Z"], [1,2,25,"2010-01-01T00:00:05Z"], ...]
    }
  },
  // "axesOrder" not necessary if only one axis
  "referencing" : [{
    "axes": [["seq", "x"], ["seq","y"]],
    "crs": "http://some/url/representing/CRS84"
  },{
    "axes": [["seq", "z"]],
    "crs": "http://some/url/representing/Z"
  }]
}

or... the "referencing" could also be inlined to have a simpler structure:

"domain": {
  "axes": {
    "seq": {
      "label": "point along trajectory",
      "componentNames": ["x", "y", "z", "t"],
      "referencing" : [{
        "axes": ["x","y"],
        "crs": "http://some/url/representing/CRS84"
      },{
        "axes": ["z"],
        "crs": "http://some/url/representing/Z"
      }],
      "values": [ [0,1,20,"2010-01-01T00:00:00Z"], [1,2,25,"2010-01-01T00:00:05Z"], ...]
    }
  },
  // "axesOrder" not necessary if only one axis
}

What do you think?

letmaik commented 8 years ago

When exposing such axes directly and generically in a client it's probably a good idea to differentiate between varying and non-varying axes to prevent forcing things like get(t,0,0,0) and rather allow get(t) while still having the other axes exposed in the domain itself. I propose that to indicate that an axis is non-varying the field "value" has to be used instead of "values".

Also, for grids only, it is the case that a varying axis like time or vertical can be empty. Again, exposing axes directly in a generic way would lead to code like get(y,x) or get(t,y,x) etc. depending on which varying axes are defined. Since this is a pain to handle, I propose that any domain type we define must always have the same set of axes, even if some are empty. An empty axis could be indicated by a null value like "z": null but it would still appear in "axesOrder". For grids, "axesOrder" would then always be ["t", "z", "y", "x"].

jonblower commented 8 years ago

Regarding your structure, I think this looks good. Not quite sure what the best way of handling reference systems is, but I tend to prefer the inlined structure for composite axes. I wonder if "composite" or "auxiliary" (following CF) might be a better term than "seq"?

Also, use of "value" rather than "values" sounds sensible, but does it make clients more complicated?

Not sure about including empty axes. I can see the benefit to clients, but it seems a bit odd to have the axesOrder property including axes that don't exist. Could be a bit confusing.

letmaik commented 8 years ago

"composite" works for me, however I think "auxiliary" would be a bit misleading and confusing, not to mention hard to type. When you have a trajectory in CF, then the variables "time", "lon", "lat" are all auxiliary coordinate variables with a common dimension "time". The range variable would have "time" as dimension.

You're right about "value" vs "values", it's unnecessarily non-uniform. Instead, let's add "fixed": true for non-varying axes. Then generic clients can choose to care about that detail or not, which in the end is just a hint. If it later turns out we don't need that hint after all we can simply drop it.

As an alternative for empty axes we could require for grids that all axes must be present and if t or z is unknown it must contain an "unknown" coordinate:

"domain": {
  "axes": {
    "t": {
      "values": [null]
    },
    "z": {
      "values": [null]
    },
    "y": {
      "label": "latitude",
      "values": [45, 46, 47, ...]
    },
    "x": {
      "label": "longitude",
      "values": [0, 1, 2, 3, 4 ...]
    }
  },
  "axesOrder": ["t","z","y","x"]
}

Formally this would make more sense I think.

letmaik commented 8 years ago

  "domain": {
    "type": "Profile"
    "axes": {
      "z": {
        "values": [ 5.4562, 8.9282, 14.8802, 20.8320, 26.7836, 32.7350 ]
      },
      "x": {
        "values": [ -10.1 ],
        "fixed": true
      },
      "y": {
        "values": [ -40.2 ],
        "fixed": true
      },
      "t": {
        "values": ["2013-01-13T11:12:20Z"],
        "fixed": true
      }
    },
    "axesOrder": ["z","x","y","t"]
  }

vs

  "domain" : {
    "type" : "Profile",
    "x" : -10.1, 
    "y" : -40.2,
    "z" : [ 5.4562, 8.9282, 14.8802, 20.8320, 26.7836, 32.7350 ],
    "t" : "2013-01-13T11:12:20Z"
  }

Just sayin'...

I think we can get rid of the "fixed" field again and instead use "axesOrder" more wisely. Q: What is axesOrder good for? A: For associating range values to domain coordinates - which is only relevant for varying axes! So what about just having "axesOrder": ["z"] and not using "fixed": true? Then the question also doesn't arise in which order to put the non-varying axes and it is directly clear which axes are varying, namely those within "axesOrder". By convention we could say that any axes not in "axesOrder" are non-varying and come at the end after all varying ones in any order. If a client decides to implement something like domain.getPosition(...) which should return the coordinates corresponding to axes indices, the easiest is probably to return an object like {x: -10.1,...} where as input you could give it the varying axis indices in correct order, or also use an object {z: 0}.

Which brings us to this:

  "domain": {
    "type": "Profile"
    "axes": {
      "z": {
        "values": [ 5.4562, 8.9282, 14.8802, 20.8320, 26.7836, 32.7350 ]
      },
      "x": {
        "values": [ -10.1 ]
      },
      "y": {
        "values": [ -40.2 ]
      },
      "t": {
        "values": ["2013-01-13T11:12:20Z"]
      }
    },
    "axesOrder": ["z"]
  }

Now, how about we allow a short-cut for the case where you only need to use the "values" field:

  "domain": {
    "type": "Profile"
    "axes": {
      "z": [ 5.4562, 8.9282, 14.8802, 20.8320, 26.7836, 32.7350 ],
      "x": [ -10.1 ],
      "y": [ -40.2 ],
      "t": ["2013-01-13T11:12:20Z"]
    },
    "axesOrder": ["z"]
  }

I don't think this complicates clients. It is dead-easy to normalize this if you want to:

function normalize(domain) {
  for (var axis in domain.axes) {
    if (Array.isArray(domain.axes[axis]))
      domain.axes[axis] = {values: domain.axes[axis]}
  }
}

I think this gets us back to our goal of simplicity. Which means simple for simple cases and more complicated for more complicated cases.

jonblower commented 8 years ago

I think we're getting there, just a few things:

It's a pity to lose the labels of axes. This is pretty useful for readability, and also for plotting (where else might you get the labels from?). I think it's worth keeping them, at the expense of a little verboseness. Also, I think an "axis" is really an object rather than just a list of values.
I suppose axesOrder is only necessary when more than one axes has multiple values (hence not really necessary for profiles, timeseries or trajectories). And you would only need to include those axes, yes.
I agree that "fixed" isn't necessary. It seems to be redundant when you can do values.length.
(I think axisOrder may be slightly better than axesOrder.)

letmaik commented 8 years ago

To be honest, I always found axis labels within data strange, because I always get the impression when you have "label": "latitude" that you're supposed to parse that to get the meaning of the axis, which of course is not the case. And if, then it would be "label": { "en": "latitude"}. And is a label really enough? What about units? Should they go inside the label? Extra? And doesn't all that really belong to the CRS? For composite axes it makes a bit more sense to me, since you could say things like, "point along ship track". But for non-composite axes, the CRS would define the proper information already, in an ideal world that is.
Perfect, solved.
Yep.
I felt the same, but was lacking some strong English gut feeling about that.

So, I think axis labels should be optional and used only when they provide real value ("latitude" is not real value). I'll still fight for the optional compact representation, just because it looks nicer and fits better onto a slide. Which is damn important if you ask me.

jonblower commented 8 years ago

Well, CF includes both units and a long_name (which is a label essentially) with a coordinate variable (which is essentially an axis), and that's usually helpful. Yes, you could look in the CRS, but you would have to download the definition and parse it, which most clients wouldn't want to do (and I'm not sure is possible at the moment anyway). Labels in general are not meant to be parsed, but can still be helpful for display purposes. So I would tend to think that axes should have both a label and units, which give a kind of human-readable CRS. The formal CRS definition is given by the URL, but you would only download this definition in exceptional circumstances.

Yes, the compact representation looks nice but it also looks more ambiguous to me. If the CRS is WGS84 then most clients could be expected to figure out that y=latitude and x=longitude, but for other CRSs it would be harder for clients to work out what the axes should be called.

letmaik commented 8 years ago

One more thing before going into the label/unit issue: I thought about the "values": [null] axes for grids and came to the conclusion that it's ok to not include those in here, meaning it is ok that the client (in our case the covjson-reader library) does a check "if type == grid" and introduces such axes if he wants to for more convenient access. Then we also have a better match to CIS.

Now, about the rest. More things to consider:

I think we should also look at how labels and units work for composite axes.
Also, if you have a Section, you have one composite XYT axis and a separate Z axis. Should it be possible to assign a 3D CRS to XYZ? If so, then it has to happen in the outer layer somehow.
When you encode time with a CRS, the units are important for processing (and are part of the CRS) but less important for plotting since you may convert it to a standard time string.

How about the following for a Section:

"domain": {
  "axes": {
    "composite": {
      "components": ["x", "y", "t"],
      "values": [ [0,1,123456], [1,2,1234567], ...]
    },
    "z": [1,2,3,4,5]
  },
  "axisOrder": ["z", "composite"],
  "referencing" : [{
    "axes": [["composite","x"], ["composite","y"], "z"],
    "crs": {
      "id": "http://some/url/representing/a/3D/CRS",
      "axes": [{
        "label": {"en": "Longitude"},
        "unit": "degrees"
      }, {
        "label": {"en": "Latitude"},
        "unit": "degrees"
      },{
        "label": {"en": "Height above WGS84 reference ellipsoid"},
        "unit": "meters"
      }]
    }
  },{
   "axes": [["composite","t"]],
   "crs": {
     "id": "http://a/time/CRS",
     "axes": [{
       "unit": "days since 1990-1-1 0:0:0"
     }]
   }
 }]
}

I guess the above time CRS would be a parameterized one with the unit as parameter. I wonder if the unit should then be part of the CRS definition above. I guess technically you don't have parameterized CRSs in URL world and would have a unique URL for each parameter combination. Then the above would make sense again. Or you refer to the CRS by something like "base" instead of "id", and "id" is left out if it doesn't exist for a concrete CRS (with filled in parameters).

I think units only occur if you use a CRS, so including them there could make sense. Labels are trickier, and I assume you want to describe the components of a composite axis similarly to a normal axis? In the end, a composite axis is just a list of axes running parallel to each other. And in fact, in the above example, I refer to those components as "axes" within the referencing!

letmaik commented 8 years ago

One more thing that came to my mind when I thought about sparse (torque-like) vs dense (ours) range encodings is that the axis order is only relevant for the dense encoding and is not actually a formal part of the domain itself, where the domain is just a set of domain objects/elements. So, the axis order could in theory also be included in the range directly for the dense encoding.

jonblower commented 8 years ago

I've got a few thoughts about your second-to-last post, which might need discussion in person. But your last point is interesting. Yes, it seems that axis order is only relevant in the range, and in fact doesn't matter at all for a sparse encoding, or even an interleaved encoding (perhaps)? But I suppose an interleaved encoding would not even have a separate domain...

It seems there are several potential use cases here, it could be worth trying to enumerate them all (even if we decide not to support all of them).

letmaik commented 8 years ago

Summary of skype call:

"axes": [["composite","x"], ["composite","y"], "z"] is too complicated; we could require unique (sub)axis names across a domain, such that it becomes "axes": ["x", "y", "z"]
having "axes" inside "crs" makes sense, but what if there's no CRS? e.g. ISO times in gregorian calendar (which is not a CRS)
curvilinear grids -> lat/lon are best exposed as ranges (as "coordinate" parameter) and not inside the domain (since it doesn't really fit the axes model)
having "coordinate parameters" makes sense, e.g. for DEMs where the domain is just x and y, and the range is z
CRS definition for coordinate parameters should be similar to how it's done in the domain -> problem: how to specify CRS for single axis like "latitude"? maybe use parameter groups for that?
default CRS may be bad because in some cases the CRS is unknown (e.g. curvilinear grids) and forcing to invent a fake CRS URI to prevent the use of the default CRS seems bad -> make CRS mandatory; also: WGS84 is not optimal for scientific datasets in many cases (tectonic drift)
we could look at WKT CRS definition for inspiration; also, it should be optionally possible to define the datum that was used

letmaik commented 8 years ago

In the CRS issue I added a WKT translation example. I'm still not sure how to include labels for non-CRS axes (ISO time string) or composite ones like trajectory though.

letmaik commented 8 years ago

Note: We already use "components" for parameter groups, possibly not a good idea to reuse that for composite axis components. (at least not within the JSON-LD context) On the other hand, could be a generic collection-like "member" predicate which would mean it's ok to reuse it.

letmaik commented 8 years ago

I'll start speccing the basic building blocks of the domain, which are axes, axisOrder, and referencing:

"domain": {
  "axes": {
    "x": ...,
    "y": ...,
    "t": ...
  },
  "axisOrder": [...],
  "referencing" : [{
    "axes": ["x","y"],
    "crs": {
      "id": "http://www.opengis.net/def/crs/OGC/1.3/CRS84",
      "type":  "GeodeticCRS"
    }
  },{
   "axes": ["t"],
   "calendar": "gregorian"
 }]
}

I'll do it as I think makes sense, to have a first draft. After that we can fix all the details separately.

Also, similarly to the categoryEncoding, we'll put the axisOrder inside the domain (and not the range) to simplify things.

letmaik commented 8 years ago

Just remembered again that polygons don't fit nicely in the composite axis model we have. You couldn't write "components": ["x","y"] without any other information since you don't have a single layer to step in (like for trajectories) but a more complex structure. I'll be adhoc for now and add "geometryType": "Polygon" to the axis object. To be consistent, I'll add "geometryType": "Point" in the composite case (following geojson's types). By default, it would be "geometryType": "Coordinate" (not defined in geojson). The "components" field would then refer to the coordinates/components of each Point, since Point is the basic structure for more complex things like composite or polygon. We could also name it "pointComponents" then...

letmaik commented 8 years ago

For "referencing" it also doesn't make much sense to call it "axes" since this doesn't work anymore for polygons. I'll call it "identifiers" for now to capture both axis and (point)component identifiers.

letmaik commented 8 years ago

Closing this now. Implemented the new axis-aware structure. Everything else in more specific issues.

covjson / specification

Should domain contain non-varying coordinates? / domain restructuring #24