SEMICeu / Core-Location-Vocabulary

A vocabulary that describes the basic elements of location information, such as geometries and addresses.
14 stars 5 forks source link

Requirements identified after the first release of LOCN #2

Closed andrea-perego closed 2 years ago

andrea-perego commented 6 years ago

In view of possible revisions to the Core Location Vocabulary (LOCN), I include below a summary of the requirements identified after the release of the first version of LOCN and based on implementation experiences, other working groups and related specifications.

Based on their scope, such requirements can be grouped into three main classes:

  1. Representation of spatial / temporal coordinates
  2. Representation of addresses
  3. Mapping LOCN with other relevant vocabularies - especially for the representation of addresses

Representation of spatial / temporal coordinates

This set of requirements comes from three main working groups (listed in chronological order):

Requirement LOCADD GeoDCAT-AP SDW
Ability to specify bounding boxes
Ability to specify centroids
Ability to specify spatial / temporal resolution
Ability to specify spatial / temporal reference systems
Availability of an XML / RDF datatype for GeoJSON
Ability to specify start / end date(time) for temporal coverage

For some of these requirements, solutions have been proposed in GeoDCAT-AP and in the W3C Data Quality Vocabulary (DQV), which have been documented by the SDW Working Group in their best practices.

Ability to specify bounding boxes & centroids

LOCN has a general property, namely, locn:geometry, to associate a geometry with a resource. However, in some contexts, it is necessary to clarify whether the specified geometry is a not the actual geometry, but rather is the point corresponding to its (geographic) centre (centroid) or a rectangle representing its extent (bounding box).

None of the standard / most popular spatial vocabularies (GeoSPARQL included) provides properties and/or classes to model this information. The only exception is schema.org, which defines a schema:box property, but it supports only a specific encoding for the coordinates of a bounding box, whereas locn:geometry supports any standard geometry encoding. The support for a representation of centroids and bounding boxes more flexible than schema:box, and compatible with locn:geometry is also a requirement for GeoDCAT-AP.

In order to address this issue, LOCADD discussed about the definition of subproperties of locn:geometry for centroids and bounding boxes.

Ability to specify spatial / temporal reference systems

GeoDCAT-AP provides a mechanism to associate a (spatial / temporal) reference system with a dataset by using dct:conformsTo, which can also be used for geometries - or any other resource.

Since dct:conformsTo is a very general property, the fact that the object is a spatial / temporal reference system is currently addressed by using dct:type with the relevant code list values from the INSPIRE Glossary, as shown in the following example:

a:Dataset a dcat:Dataset ;
  dct:conformsTo a:SpatialReferenceSystem .

a:SpatialReferenceSystem a dct:Standard ;
  dct:type <http://inspire.ec.europa.eu/glossary/SpatialReferenceSystem> .

Moreover, an experimental RDF representation of reference system from the OGC CRS Register has been developed, mapping additional information (as the "name" of the CRS).

:warning: The Git repository including the the mapping proposal referred to from the mail above has been moved to GitHub: https://github.com/SEMICeu/epsg-to-rdf

This approach could be adopted "as is" in LOCN, although it might be desirable to have a more specific property than dct:conformsTo and/or use a stronger typing rather than using dct:type with a term from the INSPIRE Glossary.

In such a case, an initial proposal for the definition of specific classes / properties is documented here:

https://joinup.ec.europa.eu/mailman/archives/dcat_application_profile-geo/2015-July/000157.html

Moreover, the new version of the W3C Time Ontology includes a class time:TRS that could be used to type temporal reference systems.

Ability to specify spatial / temporal resolution

GeoDCAT-AP currently models spatial / temporal resolution as free text (with rdfs:comment), recognising that, at the time when the GeoDCAT-AP specification was released, no existing vocabularies provided a means to model this information.

However, this requirement has been brought to the attention of the W3C Data on Web Working Group, and a solution has been documented in the W3C Data Quality Vocabulary (DQV), as reported here:

https://joinup.ec.europa.eu/mailman/archives/dcat_application_profile-geo/2016-May/000367.html

Basically, DQV models this information as observations / measurements of a given quality metric (which corresponds to a given type of resolution).

This solution was also included by the SDW Working Group in their best practices, and it could be readily adopted in LOCN.

This would however require the definition of two groups of individuals:

  1. Those corresponding to the different types of resolution (denoting a quality metric).
  2. Those corresponding to each of the different levels of resolution (denoting the measurement of a specific quality metric).

As far as the first group is concerned (i.e., the different types of resolution), these individuals can be defined in DQV as follows:

:SpatialResolutionAsEquivalentScale a dqv:Metric;
  skos:definition "Spatial resolution of a dataset expressed as equivalent scale,
      by using a representative fraction (e.g., 1:1,000, 1:1,000,000)."@en ;
  dqv:expectedDataType xsd:decimal ;
  dqv:inDimension dqv:precision .

:SpatialResolutionAsDistance a dqv:Metric;
  skos:definition "Spatial resolution of a dataset expressed as distance"@en ;
  dqv:expectedDataType xsd:decimal ;
  dqv:inDimension dqv:precision .

This initial list can be further extended. E.g.:

:SpatialResolutionAsHorizontalGroundDistance a dqv:Metric;
  skos:definition "Spatial resolution of a dataset expressed as horizontal ground distance"@en ;
  dqv:expectedDataType xsd:decimal ;
  dqv:inDimension dqv:precision .

:SpatialResolutionAsVerticalDistance a dqv:Metric;
  skos:definition "Spatial resolution of a dataset expressed as vertical distance"@en ;
  dqv:expectedDataType xsd:decimal ;
  dqv:inDimension dqv:precision .

:SpatialResolutionAsAngularDistance a dqv:Metric;
  skos:definition "Spatial resolution of a dataset expressed as angular distance"@en ;
  dqv:expectedDataType xsd:decimal ;
  dqv:inDimension dqv:precision .    

The question is in which space such individuals should be defined (inside LOCN? in a separate code list - as the ones maintained by the EU Publications Office?).

The definition of individuals in the second group is however more problematic, since the level of resolution and unit of measurement are arbitrary (1:1000, 1:100, 1m, 1km, 100m, 10 decimal degrees, etc.).

Possible options include the following ones:

  1. Define only the individuals corresponding to the types of spatial / temporal resolution, whereas the individuals expressing the actual resolution will be defined at the data level. This solution is not optimal, since it will result in multiple definitions of the same individuals.
  2. Define individuals only for some levels of resolution and units of measurements - e.g., the most common ones. This solution may address the majority of (but not all) the cases.
  3. Set up a URI space supporting arbitrary levels of resolution and units of measurements. This register will dynamically generate the corresponding individuals based on information included in their URI.

An example of the last option, including also a proposal for how these individuals could be defined, is available at:

http://geodcat-ap.semic.eu/id/resolution/

XML / RDF datatype for GeoJSON

Property locn:geometry can be used to specify geometries also by directly using syntax encoding schemes. In such a case, it is useful that the used geometry encoding is specified by using a typed literal, and this is actually what is done in GeoDCAT-AP.

Although RDF datatypes exist for WKT and GML (they are defined in GeoSPARQL), an XML / RDF datatype for GeoJSON is missing.

To address this issue, GeoDCAT-AP uses the URL of the corresponding IANA media type (namely http://www.iana.org/assignments/media-types/application/geo+json), but this solution is not optimal, and it would be preferable to define a specific datatype.

This can be done in LOCN, but other options might be considered - e.g., a reference register for syntax encoding schemes maintained by an authority, as the EU Publications Office.

Ability to specify start / end date(time) for temporal coverage

Currently, this information is specified in DCAT-AP by using schema:startDate and schema:endDate, respectively, following ADMS. GeoDCAT-AP follows the same approach.

This issue has been brought to the attention of the W3C Dataset Exchange Working Group (see UC27), so a possible solution might be contributed in that context.

Representation of addresses

After the release of LOCN, examples of RDF representations of INSPIRE datasets concerning addresses has been released.

Two notable examples are:

The detailed requirements are yet to be collected. However, in general, they concern two main issues:

Mapping LOCN with other relevant vocabularies

After the release of LOCN, a number of use cases have been reported to enable to mapping of LOCN-encoded data into other popular vocabularies, in particular vCard and schema.org, especially for the representation of addresses.

A mapping proposal has been developed by JRC, and illustrated here:

https://joinup.ec.europa.eu/mailman/archives/dcat_application_profile-geo/2016-August/000373.html

:warning: The Git repository including the documentation of the mapping proposal referred to from the mail above has been moved to GitHub: https://github.com/SEMICeu/locn-mapping

Conclusions

Among the revisions listed above, the definition of subproperties for centroids and bounding boxes is the least problematic, and it can be readily carried out. The same applies to the missing GeoJSON datatype.

Addressing the issues concerning reference system and spatial resolution requires additional discussion on what needs to be defined (e.g., which types of resolution, which types of reference systems). A starting point can be the requirements coming from other ISA specifications, as GeoDCAT-AP, where the types of reference systems and spatial resolution used are those included in ISO 19115.

The revisions concerning addresses are currently the least consolidated, and require a detailed requirement analysis - as already said in the relevant section.

In all these cases, the question is in which space these terms should be defined. A possible option (that was also discussed in LOCADD) is to define them in separate LOCN extensions - e.g., we could have one for geometries (locn-geo) and one for addresses (locn-ad).

Finally, the mapping of LOCN with vCard and Schema.org can be considered rather stable, since most of the mappings are pretty straightforward, and it includes only a very limited number of issues. However, it should be desirable to be reviewed and tested.

makxdekkers commented 6 years ago

Issues to be considered in the context of the next major semantic release.

GeertThijs commented 3 years ago

@andrea-perego Specifying spatial / temporal reference systems: That kind of info is explicitly embedded in geometry serialisations like gml or wkt and mplicitly in serialisations like geojson (where the spatial reference system is always WGS84). Example of gml with reference system info on the serialisation: <gml:Point srsName=<"http://www.opengis.net/def/crs/EPSG/0/31370"> \n 159555.00,166155.00</gml:coordinates> \n </gml:Point>. The reference system here is Lambert 72, often used in Belgium.

GeertThijs commented 3 years ago

@andrea-perego Specifying spatial coverage with a bbox: This is implicitly covered by locn:Geometry. A bounding box would be a polygon, a centroid a point. So this is only necessary when in a data exchange it is important to single out bounding boxes or centroids from a bunch of generic geometries. Both can be derived ad hoc from any geometry and the use case is more as a query parameter than something that is actually exchanged between two parties.

EmidioStani commented 2 years ago

In agreement with the observations of @GeertThijs, this can be closed