CSIRO-enviro-informatics / loci-cache-scripts

A collection of tool to assist in the building of the loci-cache
0 stars 1 forks source link

Add GNAF Addr to Catchment to linkset generation scripts #25

Closed ashleysommer closed 4 years ago

ashleysommer commented 4 years ago

Adapt this script:

#!/bin/env python3
# -*- coding: utf-8 -*-

import csv

def get_gnaf_addr_id(uri):
    code = uri.split('/')[-1]
    return code

def get_geof_cc_id(hydroid):
    return hydroid

mb_sf_within_template = """\
:gw{within_iter:d} s: g:{addr_code:s} ;
 p: w: ;
 o: c:{hydroid:s} ;
 i: l: ;
 m: si: .

"""

def main():
    with open("./gnaf201605_cc.csv") as csv1:
        rdr = csv.reader(csv1, delimiter=',')
        header = next(rdr)
        with open("within_all_2016_05.ttl", "w") as outfile:
            for record in rdr:
                id1 = int(str(record[0]))
                gnaf_addr = get_gnaf_addr_id(str(record[1]))
                hydroid = get_geof_cc_id(str(record[2]))
                next_chunk = mb_sf_within_template.format(addr_code=gnaf_addr, hydroid=hydroid, within_iter=id1)
                outfile.write(next_chunk)

if __name__ == "__main__":
    main()
ashleysommer commented 4 years ago

Use a head that looks like this:

@prefix loci: <http://linked.data.gov.au/def/loci#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .

@prefix : <http://linked.data.gov.au/dataset/addrcatch/statement/> .
@prefix l: <http://linked.data.gov.au/dataset/addrcatch> .
@prefix s: <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> .
@prefix p: <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> .
@prefix o: <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> .
@prefix m: <http://linked.data.gov.au/def/loci/hadGenerationMethod> .
@prefix g: <http://linked.data.gov.au/dataset/gnaf/address/> .
@prefix w: <http://www.opengis.net/ont/geosparql#sfWithin> .
@prefix i: <http://purl.org/dc/terms/isPartOf> .
@prefix c: <http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> .
@prefix si: <http://linked.data.gov.au/dataset/addrcatch/SpatialIntersection> .

l: a loci:Linkset ;
  dct:title "Addresses Contracted-Catchments Linkset" ;
  dct:description """This LOC-I Project Linkset relates Address individuals in the G-NAF LOC-I Dataset to Contracted Catchment individuals in the Geofabric LOC-I Dataset. Every Address -> Catchment relation is geosparql:sfWithin, that is the Address is sfWithin the Catchment.
The Linkset triples (Address sfWithin Catchment) are reified so that each triple is contained within an RDF Statement class instance so that the triple is numbered and the method used to generate the triple is given by the loci:hadGenerationMethod.
The method used for all triples in this Linkset is the same and it is SpatialIntersection which is defined below.
The triples for the main data in this linkset - the Statements relating Addresses to Catchments - are valid RDF in the Turtle syntax but an unusual namespacing arrangement is used so all terms are indicated with as few letters as possible, mostly one letter then colon, e.g. s: for http://www.w3.org/1999/02/22-rdf-syntax-ns#subject, rather than the more common rdf:subject. This is simply to reduce file size."""@en ;
  dct:publisher <http://catalogue.linked.data.gov.au/org/psma> ;
  dcat:contactPoint _:jo ;
  dct:issued "2019-01-30"^^xsd:date ;
  dct:modified "2019-01-30"^^xsd:date ;
  dct:contributor <http://orcid.org/0000-0002-8742-7730> , <http://orcid.org/0000-0003-0590-0131> ;
  void:subjectsTarget <http://linked.data.gov.au/dataset/gnaf> ;
  void:objectsTarget <http://linked.data.gov.au/dataset/geofabric> ;
  void:linkPredicate w: ;
  m: si: .

_:jo a vcard:Individual ;
  vcard:fn "Joseph Abhayaratna" ;
  vcard:hasEmail <mailto:joseph.abhayaratna@psma.com.au> .

si: a prov:Plan ;
  rdfs:label "Spatial Intersection Method" ;
  rdfs:comment "This method uses the G-NAF LDAPI to page through the register, obtain the GeoSPARQL geometry for the address point, and then uses a OGC Simple Features Contains filter on the GeoFabric WFS Service"@en ;
  prov:value <https://github.com/jabhay/linkset_creator> ;
  prov:wasAttributedTo _:jo ;
  prov:generatedAtTime "2019-01-30"^^xsd:date .

#
# Statements
#
ashleysommer commented 4 years ago

And head that looks like this for GNAF_2016_05 to CC.

@prefix loci: <http://linked.data.gov.au/def/loci#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .

@prefix : <http://linked.data.gov.au/dataset/addr201605catch/statement/> .
@prefix l: <http://linked.data.gov.au/dataset/addr201605catch> .
@prefix s: <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> .
@prefix p: <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> .
@prefix o: <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> .
@prefix m: <http://linked.data.gov.au/def/loci/hadGenerationMethod> .
@prefix g: <http://linked.data.gov.au/dataset/gnaf-2016-05/address/> .
@prefix w: <http://www.opengis.net/ont/geosparql#sfWithin> .
@prefix i: <http://purl.org/dc/terms/isPartOf> .
@prefix c: <http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> .
@prefix si: <http://linked.data.gov.au/dataset/addr201605catch/SpatialIntersection> .

l: a loci:Linkset ;
  dct:title "Addresses Contracted-Catchments Linkset" ;
  dct:description """This LOC-I Project Linkset relates Address individuals in the G-NAF LOC-I Dataset to Contracted Catchment individuals in the Geofabric LOC-I Dataset. Every Address -> Catchment relation is geosparql:sfWithin, that is the Address is sfWithin the Catchment.
The Linkset triples (Address sfWithin Catchment) are reified so that each triple is contained within an RDF Statement class instance so that the triple is numbered and the method used to generate the triple is given by the loci:hadGenerationMethod.
The method used for all triples in this Linkset is the same and it is SpatialIntersection which is defined below.
The triples for the main data in this linkset - the Statements relating Addresses to Catchments - are valid RDF in the Turtle syntax but an unusual namespacing arrangement is used so all terms are indicated with as few letters as possible, mostly one letter then colon, e.g. s: for http://www.w3.org/1999/02/22-rdf-syntax-ns#subject, rather than the more common rdf:subject. This is simply to reduce file size."""@en ;
  dct:publisher <http://catalogue.linked.data.gov.au/org/psma> ;
  dcat:contactPoint _:jo ;
  dct:issued "2019-01-30"^^xsd:date ;
  dct:modified "2019-01-30"^^xsd:date ;
  dct:contributor <http://orcid.org/0000-0002-8742-7730> , <http://orcid.org/0000-0003-0590-0131> ;
  void:subjectsTarget <http://linked.data.gov.au/dataset/gnaf-2016-05> ;
  void:objectsTarget <http://linked.data.gov.au/dataset/geofabric> ;
  void:linkPredicate w: ;
  m: si: .

_:jo a vcard:Individual ;
  vcard:fn "Joseph Abhayaratna" ;
  vcard:hasEmail <mailto:joseph.abhayaratna@psma.com.au> .

si: a prov:Plan ;
  rdfs:label "Spatial Intersection Method" ;
  rdfs:comment "This method uses the G-NAF LDAPI to page through the register, obtain the GeoSPARQL geometry for the address point, and then uses a OGC Simple Features Contains filter on the GeoFabric WFS Service"@en ;
  prov:value <https://github.com/jabhay/linkset_creator> ;
  prov:wasAttributedTo _:jo ;
  prov:generatedAtTime "2019-01-30"^^xsd:date .

#
# Statements
#
ashleysommer commented 4 years ago

Source CSVs can be found here: https://s3.console.aws.amazon.com/s3/buckets/loci-assets/source-data/gnaf201605-address-geofabric-cc-linkset_source/?region=ap-southeast-2&tab=overview

And pre-built linksets can be found here (for reference): https://s3.console.aws.amazon.com/s3/buckets/loci-assets/linksets/?region=ap-southeast-2&tab=overview

shaneseaton commented 4 years ago

I moved the source files to https://s3.console.aws.amazon.com/s3/buckets/loci-assets/source-data/gnaf-geofab-linkset-source due to a naming inconsistency (it had both 1811 and 1605 version but was labelled 1605)

shaneseaton commented 4 years ago

I have integrated this script in a docker based workflow consistent with other linkset creation tools.