CSIRO-enviro-informatics / addrcatch-linkset

Creative Commons Attribution 4.0 International
2 stars 1 forks source link

Current Addresses to Catchments Linkset

This code repository contains a Linkset - a specialised Dataset linking objects in two other Datasets.

This Linkset contains spatial associations between Address class objects in the latest version of the Geocoded National Address File (GNAF Current) and Catchment class objects in the Geofabric.

Addresses, in the GNAF 2016 May dataset, are represented spatially as points. Catchments, in the Geofabric, are represented spatially as polygons. Catchments do not overlap and cover all of Australia, so any GNAF Current Address will lie within one, and only one, Catchment.

The formal definition of what a Linkset is, is provided by the Location Index (LocI) project within its project ontology, see:


Figure 1: A geocoded address ('+', top layer) linked to a catchment polygon (bottom layer). Each link in this Linkset states a geocoded address ID, a catchment ID, the relationship type (always geo:sfWithin), a method used to make the link and the ID of the link itself.

Repository Contents

Purpose

This repository contains a Linkset. Linksets are specialised Linked Data datasets that link objects, such as Addresses or Catchments, in one Linked Data dataset to objects in another.

Publishing relationships between Datasets as distinct Linksets allows for the independent management of Dataset-to-Dataset relationships.

Linksets for Spatial Relationships

Where LocI objects across multiple datasets have spatial relationships that we wish to represent, we create Linksets with spatial (topological) relationships such as touches, within, overlaps etc. using terms formalised in the (GeoSPARQL Standard](https://www.opengeospatial.org/standards/geosparql).

Linksets for Dataset versions

Some LocI Datasets, such as the ASGS, have multiple, independently delivered versions (the ASGS is released as a Linked Data Datasets in both 2011 and 2016 versions). Linksets can be used to link between these versions of a Dataset too. This allows for information such as correspondence tables (links between ASGS versions, published by the Australian Bureau of Statistics) to be published as Linked Data independently of any other Dataset.

This Linkset

This Linkset - GNAF Current Addresses to Geofabric Catchments Linkset - is a spatial relations Linkset linking GNAF Current Addresses (points) to Geofabric Catchments (polygons) by indicating which Catchment each Address is within.

This Linkset states, per Address and with other details, something like this:

Address GAACT714845933
is within
Catchment 7155143

...and that this particular link was made on the 6th of February, 2015 using a Parcel Level matching method.

How is a Linkset’s data organised?

Linksets include the main facts of relations between objects in two datasets - what the IDs of two objects are and how they are related - and they also include information about how links were created, such as what spatial intersection method was used to establish a topological relation. Linkset generation might have employed multiple methods to make all the object-to-object links within it so a Linkset may relate multiple methods and give the particular method used for each link.

Other per-link information may be recorded too: if the links within a Linkset are generated over a significant period of time then the each link may have a created time; if different people/organisations contributed different links then each link may reference their specific contributor.

Linkset content sections

Linksets use a highly condensed, but still (sort of) human-readable data format to include many (millions) of links. Linkset data files contain:

Linksets include all their information in one potentially very large file but they also include the header information in a stand-alone text file - header.ttl.

They also include a few (perhaps 10) example Statements in a stand-alone text file - example-data-… .ttl (numbered as there may be many).

Linkset files

In addition to the main Linkset data file and the header.ttl and example-data.ttl files, there are usually several other files within a Linkset, including this README file. General Linkset files include:

This specific Linkset’s files are listed in above in Repository Content.

Linkset data format

In its long list of statements, this Linkset expresses each link like this:

In RDF code this link is expressed as:

:1 a rdf:Statement ;
  rdf:subject:     address:GAACT714845933 ;
  rdf:predicate: geo:sfWithin ;
  rdf:object       catchment:7155143 ;
  loci:hadGenerationMethod: :SpatialIntersection 
.

With contractions used to save data volumes resulting in:

:1
  s: g:GAACT714845933 ;
  p: w: ;
  o: b:7155143 ;
  m: :SpatialIntersection ;
.

See the file example-data.ttl for the first 10 Statements of the Linkset expressed like this and see the header.ttl file to explain all the contractions.

Linkset metrics

Linksets always contain similar information - links between objects in datasets - and a standard set of metrics can be calculated for any Linkset. These metrics, set by the LocI project, are:

Metric calculation

A series of queries to calculate Linkset metrics is being prepared here: https://github.com/CSIRO-enviro-informatics/linkset-metrics

This Linkset’s metrics

Metric Value
Number of links 14502000
Number of items in Dataset A (from) not linked not yet calculated
Number of items in Dataset B (to) not linked not yet calculated
Number of link creation methods used 1 (Spatial Intersection)
Numbers of uses of each link-creation method 14502000

Rights & License

The content of this API is licensed for use under the Creative Commons 4.0 License. See the license deed all details.

Contacts

LocI Project technical owner:
Nicholas Car
CSIRO Land & Water, Environmental Informatics Group
nicholas.car@csiro.au

Linkset creator:
Joesph Abhayaratna
CTO, PSMA Ltd.
joseph.abhayaratna@psma.com.au