CSIRO-enviro-informatics / loci.cat

Latest updates from the Loc-I Project
https://loci.cat
Other
0 stars 1 forks source link

Include feature shape information in the Loc-I dataset #37

Closed jyucsiro closed 4 years ago

jyucsiro commented 4 years ago

There has been discussion of including information in the description of a loci:Feature to denote whether the feature is expressed as a point, polygon/multipolygon, etc. or not.

The problem with not having a property to provide this information is that for certain applications, the queries used do a sort-of manual inference as to whether the feature has an area, and is thus able to proceed with a calculation, e.g. reapportioning values from one polygon to another polygon. In the case of taking values associated with a feature denoted by a point and 'reapportioning' it to another point, this isn't an operation/calculation that makes sense.

Would it be better to have a property describe an enumeration of geometry types for a feature so that this is explicit rather than having to try and implicitly guess?

A complication is that a feature may have multiple 'realisations'?

dr-shorthair commented 4 years ago

Right - I understand the issue. geosparql:hasGeometry is generic so you need to look inside to determine the valid ops. However, a single geometryType tag would not work if there are multiple geometries (not common at present, but definitely allowed and likely to be important in some datasets later).

jyucsiro commented 4 years ago

yes, well, currently we don't encode anything except the WKT.

Looking into the geosparql spec, geosparql:hasGeometry is defined like so

geo:hasGeometry a rdf:Property,
 owl:ObjectProperty;
 rdfs:isDefinedBy <http://www.opengis.net/spec/geosparql/1.0>;
 rdfs:label "has Geometry"@en;
 rdfs:comment "A spatial representation for a given feature."@en;
 rdfs:domain geo:Feature;
 rdfs:range geo:Geometry .

If I understand correctly, we could overload the rdf:type property to denote what geometry type it is using GML types. That would allow us to query what geometries and types they are, without creating a new property...

:myFeature a loci:Feature ;
   geo:hasGeometry 
       [ 
            rdf:type geo:Geometry, gml:Point;
            rdfs:label "point geom"^^xsd:string ;
            geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(145.31364001 -38.39395284)"^^geo:wktLiteral 
       ] ,
       [ 
            rdf:type geo:Geometry, gml:Polygon ;
            rdfs:label "polygon geom"^^xsd:string ;
            geo:asWKT "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"^^geo:wktLiteral ] ;
        ]
.

does that make sense?

benjaminleighton commented 4 years ago

Related https://github.com/CSIRO-enviro-informatics/geofabric-dataset/issues/21

dr-shorthair commented 4 years ago

The most sound solution is to define sub-properties of geosparql:hasGeometry but that would then make these things unrecognisable to applications that are looking for vanilla geosparql:aaa properties. A reasoning processor could infer that a sub-property is also the super-property, but that requires more stuff to be loaded/known to the processor. Else you repeat the value in two properties. I'd like opinions from those folk developing client applications (incl. eXcelerator) on the preferred direction. (Sorry I drafted this earlier but didn't save, so partly overtaken by the other comments.)

dr-shorthair commented 4 years ago

Yes @jyucsiro that is probably the best solution. It doesn't add anything gratuitous, but captures the information.

However, maybe use sf:Point instead of gml:Point etc? (it is less tied to an XML encoding)

dr-shorthair commented 4 years ago

Meanwhile, I'm looking at the test dataset again and find that geo:hasGeometry is used inconsistently/incorrectly in some places:

  1. http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000 is OK but encoded as GML - I think WKT would be preferred
  2. http://linked.data.gov.au/dataset/geofabric/contractedcatchment/12101547 etc do not have any geometry
  3. all the GNAF examples have
    <xxx> a gnaf:Something ;
    geo:hasGeometry [
        a gnaf:Geocode ; 
    ... 
    ] ;
    .

    which refers to a class gnaf:Geocode which is not in the GNAF ontology. It looks like it is an extension of gnaf:Geometry as required by the global range constraint, but clearly there is some cleaning up to do.

dr-shorthair commented 4 years ago

And you do not have to include rdf:type geo:Geometry - it is entailed by the geo:hasGeometry predicate. So the example above can be:

:myFeature a loci:Feature ;
   geo:hasGeometry 
       [ 
            rdf:type sf:Point;
            rdfs:label "point geom"^^xsd:string ;
            geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(145.31364001 -38.39395284)"^^geo:wktLiteral 
       ] ,
       [ 
            rdf:type sf:Polygon ;
            rdfs:label "polygon geom"^^xsd:string ;
            geo:asWKT "POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))"^^geo:wktLiteral ] ;
        ]
.
jyucsiro commented 4 years ago

cool, dropping geo:Geometry might be a nice trick (though not sure if we use it in our queries...).

yes - it appears that the use of geo:hasGeometry is not consistent across ASGS, GNAF, Geofabric.

Ideally, all features/geometry descriptions would be consistent like so:

benjaminleighton commented 4 years ago

I've been looking at geofabric code and found that there used to be functionality that @ashleysommer wrote to render geo:hasGeometry information. Like asgs geo:asGML is used to encode the geometry details. I'm not sure what the justification is in going to WKT but it means more work for someone in what looks like fairly tricky code in both Geofabric and ASGS LDAPIs. There is also existing code that appears to convert the GML to GeoJSON, this is used to render GeoJSON text as part of the asgsld.net html view. I'm unclear on why.

Including rdf:type sf:[type] should be easy because we can extract and translate this information out of the GML. Both geofabric and asgs wrap gml:polygon in gml:surfaceMember and gml:MultiSurface but it isn't clear why this structure is chosen, possibly MultiSurface provides flexibility for complex collections of geometric features but we might want to exclude this kind of complexity from many Loc-i datasets.

I've created and pushed a branch https://github.com/CSIRO-enviro-informatics/geofabric-dataset/tree/feature/hasGeometry that reenables some existing functionality that @ashleysommer disabled as part of of doing some albers area or area corrections. Probably this won't work correctly and needs more work but a local LDAPI running off will generate hasGeometry information for geofabric catchments at least. It doesn't seem to be implemented for riverregions.

From what i've seen it is likely that logic in asgs and geofabric (and thus other LDAPIs) is duplicated and fairly bespoke at the moment. One option might be to refactor logic to provide a dedicated micro-LDAPI that can integrate with broader LDAPIs and is responsible purely for wrapping geospatial boundary sources of truth (e.g WFS) customizing them minimally with configuration on a dataset by dataset basis and producing minimalist geospatial data graphs. This would be useful for linkset generation also.

jyucsiro commented 4 years ago

As a general principle, I'd be in favour of using WKT as a preferred representation to be consistent, vs. a mix of WKT/GML/GeoJSON (it reads better in Turtle and seems to be more compact?). However, I understand that it might mean converting from other representations...

dr-shorthair commented 4 years ago
  1. Ah - where the data is coming from a WFS then GML-encoded geometries are ready and waiting. So can be dropped into a geo:asGML literal without re-serializing.
  2. I note that OGC-API requires support for only GeoJSON and GML. WKT is not even mentioned :-(
  3. What is the geo:defaultGeometry for in Loc-I context? Is it primarily for rendering?
dr-shorthair commented 4 years ago

See #39 Persist the geometries externally, and then the value of geo:hasGeometry is a URI-reference

jyucsiro commented 4 years ago

Resolved as we've decided to host geometries in a Loc-I Geometry Data Service and embed a reference to the URI for the geometry in the dataset description.