CSIRO-enviro-informatics / gnaf-dataset

A Linked Data version of the Geocoded National Address File (G-NAF)
http://gnafld.net
GNU General Public License v3.0
8 stars 2 forks source link

Convert GNAF data to use simplified/aligned version of the ontology #11

Closed dr-shorthair closed 4 years ago

dr-shorthair commented 4 years ago

https://github.com/CSIRO-enviro-informatics/loci.cat/wiki/Simplifying-the-initial-ontologies describes a simplification of the GNAF datasets to match a more unified Loc-I ontology pattern. The goal is to simplify/harmonize the SPARQL queries.

The transformations required are illustrated by-example as follows.

Original form:

<http://linked.data.gov.au/dataset/gnaf-2016-05/address/GAVIC411436309>
  rdf:type gnaf:Address ;
  gnaf:gnafType <http://gnafld.net/def/gnaf/code/AddressTypes#Rural> ;
  gnaf:hasAddressPrimary <http://linked.data.gov.au/dataset/gnaf-2016-05/address/GAVIC425683387> ;
  gnaf:hasAddressSite <http://linked.data.gov.au/dataset/gnaf-2016-05/addressSite/411591483> ;
  gnaf:hasDateCreated "2015-07-27"^^xsd:date ;
  gnaf:hasDateLastModified "2016-04-28"^^xsd:date ;
  gnaf:hasGnafConfidence <http://gnafld.net/def/gnaf/GnafConfidence_1> ;
  gnaf:hasLocality <http://linked.data.gov.au/dataset/gnaf-2016-05/locality/VIC943> ;
  gnaf:hasNumber [
      rdf:type gnaf:Number ;
      gnaf:gnafType <http://linked.data.gov.au/def/gnaf/code/NumberTypes#FirstStreet> ;
      prov:value 1 ;
    ] ;
  gnaf:hasPostcode 3921 ;
  gnaf:hasState <http://www.geonames.org/2145234> ;
  gnaf:hasStreetLocality <http://linked.data.gov.au/dataset/gnaf-2016-05/streetLocality/VIC2021622> ;
  geo:hasGeometry [
      rdf:type gnaf:Geocode ;
      gnaf:gnafType <http://gnafld.net/def/gnaf/code/GeocodeTypes#PropertyAccessPointSetback> ;
      geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(145.35714361 -38.34785008)"^^geo:wktLiteral ;
      rdfs:label "Property Access Point Setback" ;
    ] ;
  rdfs:comment "1 Mcleod Road, French Island, VIC 3921" ;
  rdfs:label "Address GAVIC411436309 of Rural type" ;
.

Preferred form

  1. gnaf:gnafTypedcterms:type
  2. gnaf:hasDateCreateddcterms:created and gnaf:hasDateLastModifieddcterms:modified
  3. add dcterms:identifier for the G-NAF identifier
  4. gnaf:Geocodesf:Point
  5. Add collection-membership triples
    <http://linked.data.gov.au/dataset/gnaf-2016-05/address/GAVIC411436309>
    rdf:type gnaf:Address ;
    gnaf:hasAddressPrimary <http://linked.data.gov.au/dataset/gnaf-2016-05/address/GAVIC425683387> ;
    gnaf:hasAddressSite <http://linked.data.gov.au/dataset/gnaf-2016-05/addressSite/411591483> ;
    gnaf:hasGnafConfidence <http://gnafld.net/def/gnaf/GnafConfidence_1> ;
    gnaf:hasLocality <http://linked.data.gov.au/dataset/gnaf-2016-05/locality/VIC943> ;
    gnaf:hasNumber [
      rdf:type gnaf:Number ;
      gnaf:gnafType <http://linked.data.gov.au/def/gnaf/code/NumberTypes#FirstStreet> ;
      prov:value 1 ;
    ] ;
    gnaf:hasPostcode 3921 ;
    gnaf:hasState <http://www.geonames.org/2145234> ;
    gnaf:hasStreetLocality <http://linked.data.gov.au/dataset/gnaf-2016-05/streetLocality/VIC2021622> ;
    dcterms:created "2015-07-27"^^xsd:date ;
    dcterms:identifier "GAVIC411436309" ;
    dcterms:modified "2016-04-28"^^xsd:date ;
    dcterms:type <http://gnafld.net/def/gnaf/code/AddressTypes#Rural> ;
    geo:hasGeometry [
      rdf:type sf:Point ;
      gnaf:gnafType <http://gnafld.net/def/gnaf/code/GeocodeTypes#PropertyAccessPointSetback> ;
      geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(145.35714361 -38.34785008)"^^geo:wktLiteral ;
      rdfs:label "Property Access Point Setback" ;
    ] ;
    loci:isMemberOf <http://linked.data.gov.au/dataset/gnaf-2016-05/address/> ;
    rdfs:comment "1 Mcleod Road, French Island, VIC 3921" ;
    rdfs:label "Address GAVIC411436309 of Rural type" ;
    .

    In a later phase, we may also externalize the geometry.

dr-shorthair commented 4 years ago

Item 1.:

INSERT { ?m dcterms:type ?c . }
WHERE { ?m gnaf:gnafType ?c . }

in addition to gnaf:gnafType for now.

dr-shorthair commented 4 years ago

Item 2.:

INSERT { 
    ?m dcterms:created ?dc . 
    ?m dcterms:modified ?dm . 
}
WHERE { 
    OPTIONAL { ?m gnaf:hasDateCreated ?dc . } 
    OPTIONAL { ?m gnaf:hasDateLastModified ?dm . }
}

DELETE {    
        ?m gnaf:hasDateCreated ?dc . 
    ?m gnaf:hasDateLastModified ?dm . 
}
WHERE { 
    OPTIONAL { ?m gnaf:hasDateCreated ?dc . } 
    OPTIONAL { ?m gnaf:hasDateLastModified ?dm . }
}
dr-shorthair commented 4 years ago

Item 3.:

INSERT { ?gf dcterms:identifier ?id. }
WHERE { 
    { ?gf a gnaf:Address . } UNION { ?gf a gnaf:Locality . } UNION { ?gf a gnaf:Street . }
        BIND( STRDT( REPLACE( str( ?gf), '^.*(#|/)', "" ), gnaf:gnaf-2016-05 ) AS ?id)
}
dr-shorthair commented 4 years ago

Item 4.:

INSERT { ?g a sf:Point . }
WHERE { ?g a gnaf:Geocode . }

DELETE { ?g a gnaf:Geocode . }
WHERE { ?g a gnaf:Geocode . }
dr-shorthair commented 4 years ago

Item 5.:

INSERT { 
        ?gf loci:isMemberOf ?reg . 
        ?reg a rdf:Bag , loci:Dataset ; rdfs:member ?gf . 
}
WHERE { 
    { ?gf a gnaf:Address . } UNION { ?gf a gnaf:Locality . } UNION { ?gf a gnaf:Street . }
    BIND(  IRI ( REPLACE ( STR (?gf), "(#|/)[^#/]*$", "$1" )) AS ?reg )
}
ashleysommer commented 4 years ago

@dr-shorthair Clarification please. In the preferred form above, it says to use dcterms:type rather than gnafType, however in the preferred form example, these BNodes still use gnafType:

  gnaf:hasNumber [
      rdf:type gnaf:Number ;
      gnaf:gnafType <http://linked.data.gov.au/def/gnaf/code/NumberTypes#FirstStreet> ;
      prov:value 1 ;
    ] ;
 geo:hasGeometry [
      rdf:type sf:Point ;
      gnaf:gnafType <http://gnafld.net/def/gnaf/code/GeocodeTypes#PropertyAccessPointSetback> ;
      geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(145.35714361 -38.34785008)"^^geo:wktLiteral ;
      rdfs:label "Property Access Point Setback" ;
    ] ;

Should these also be changed to dcterms:type?

ashleysommer commented 4 years ago

Another question.. The GNAF pyldapi implementation has a DCT-profile view which was started last year, but not finished. Would it make sense to add these DCT properties to the DCT profile, rather than the GNAF profile, and leave the GNAF profile fully aligned with the GNAF ontology?

jyucsiro commented 4 years ago

@ashleysommer Simon has provided a full example of the test data i provided here: https://raw.githubusercontent.com/CSIRO-enviro-informatics/loci-testdata/simplify-1/loci-ld-dataset/loci-instances-1.ttl

it appears dcterms:type is included as a minimum, but he has also included gnaf:gnafType duplicating the value too. e.g.

geo:hasGeometry [
      a sf:Point ;
      gnaf:gnafType <http://gnafld.net/def/gnaf/code/GeocodeTypes#StreetLocality> ;
      dcterms:type <http://gnafld.net/def/gnaf/code/GeocodeTypes#StreetLocality> ;
      geo:asWKT "<http://www.opengis.net/def/crs/EPSG/0/4283> POINT(145.31043640 -38.38502194)"^^geo:wktLiteral ;
      rdfs:label "Street Locality" ;
    ] ;

so it appears that he's suggesting use dcterms:type as required, but include gnaf:gnafType also (i'm assuming for backwards compatibility?)

jyucsiro commented 4 years ago

@ashleysommer as mentioned on https://github.com/CSIRO-enviro-informatics/asgs-dataset/issues/13#issuecomment-571624367 - I'm coming around to having 2 profiles - loci and gnaf, in this case, would be the approach.

dr-shorthair commented 4 years ago

I checked with Jo and he was comfortable with GNAF Ontology being updated. So I went ahead and did it here https://github.com/AGLDWG/gnaf-ont/blob/master/gnaf.ttl