CSIRO-enviro-informatics / asgs-dataset

GNU General Public License v3.0
0 stars 1 forks source link

Convert ASGS data to use simplified/aligned version of the ontology #13

Closed dr-shorthair closed 4 years ago

dr-shorthair commented 4 years ago

https://github.com/CSIRO-enviro-informatics/loci.cat/wiki/Simplifying-the-initial-ontologies describes a simplification of the ASGS datasets to match a more unified Loc-I ontology pattern. The goal is to simplify/harmonize the SPARQL queries.

The transformations required are illustrated by-example as follows.

Original form:

<http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000>
  rdf:type asgs:MeshBlock ;
  rdf:type geo:Feature ;
  asgs:category "Primary Production" ;
  asgs:mbCode2016 "20663970000" ;
  geox:hasAreaM2 [
      data:value 58387600.000000007450580596923828125 ;
      ns21:crs <http://www.opengis.net/def/crs/EPSG/0/3577> ;
    ] ;
  geox:hasAreaM2 [
      data:value 95157257.606680378 ;
      ns21:crs <http://www.opengis.net/def/crs/EPSG/0/3857> ;
    ] ;
  reg:register <http://linked.data.gov.au/dataset/asgs2016/meshblock/> ;
  geo:hasGeometry [
      rdf:type geo:Geometry ;
      geo:asGML """<gml:MultiSurface ..."""^^geo:gmlLiteral ;
    ] ;
.

<http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801>
  rdf:type asgs:StatisticalAreaLevel1 ;
  rdf:type geo:Feature ;
  asgs:isStatisticalAreaLevel1Of <http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000> ;
  asgs:sa1Maincode2016 "20503108801" ;
  asgs:statisticalArea1Sa111DigitCode "20503108801" ;
.

Preferred form

  1. asgs:categorydcterms:type whose object is a URI denoting a concept
  2. asgs:mbCode2016 etc → dcterms:identifier with a specific literal datatype
  3. asgs:isStatisticalAreaLevel1Of etc → geo:sfContains and add matching geo:sfWithin for the inverse case
  4. reg:registerloci:isMemberOf and inverse
  5. type of geometry is explicit or geometry is externalized
    
    <http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000>
    rdf:type asgs:MeshBlock ;
    rdf:type asgs:Feature ;
    rdf:type geo:Feature ;
    geox:hasAreaM2 [
      data:value 58387600.000000007450580596923828125 ;
      qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/3577> ;
    ] ;
    geox:hasAreaM2 [
      data:value 95157257.606680378 ;
      qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/3857> ;
    ] ;
    loci:isMemberOf <http://linked.data.gov.au/dataset/asgs2016/meshblock/> ;
    dcterms:identifier "20663970000"^^asgs-id:mbCode2016 ;
    dcterms:type asgs-cat:primary-production ;
    geo:hasGeometry [
      rdf:type sf:MultiSurface ;
      geo:asGML """<gml:MultiSurface ..."""^^geo:gmlLiteral ;
    ] ;
    geo:hasGeometry <http://gds.loci.cat/geometry/asgs16_mb/20663970000> ;
    geo:sfWithin <http://linked.data.gov.au/dataset/asgs2016/stateorterritory/2> ;
    geo:sfWithin <http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801> ;
    .

http://linked.data.gov.au/dataset/asgs2016/meshblock/ a loci:Dataset , a rdf:Bag ; rdfs:member http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000 .

http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801 a asgs:Feature ; a asgs:StatisticalAreaLevel1 ; a geo:Feature ; loci:isMemberOf http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/ ; dcterms:identifier "20503108801"^^asgs-id:sa1Maincode2016 ; dcterms:identifier "20503108801"^^asgs-id:statisticalArea1Sa111DigitCode ; geo:hasGeometry http://gds.loci.cat/geometry/asgs16_sa1/20503108801 ; geo:sfContains http://linked.data.gov.au/dataset/asgs2016/meshblock/20663970000 ; geo:sfWithin http://linked.data.gov.au/dataset/asgs2016/stateorterritory/2 ; geo:sfWithin http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel2/205031088 ; .

http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/ a loci:Dataset , a rdf:Bag ; rdfs:member http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/20503108801 .

dr-shorthair commented 4 years ago

Item 1.:

INSERT { ?m dcterms:type ?c . }
WHERE {
    ?m a asgs:MeshBlock ; asgs:category ?C .
        BIND( IRI( concat( "http://linked.data.gov.au/def/asgs-cat/", LCASE(REPLACE( ?C , "[ /]" , "-" )))) as ?c )
}

DELETE{ ?m asgs:category ?C .}
WHERE { ?m a asgs:MeshBlock ; asgs:category ?C . }
dr-shorthair commented 4 years ago

Item 2.:

INSERT { ?m dcterms:identifier ?typedCode .}
WHERE {
    ?m a geo:Feature ; ?p ?code .
    FILTER ( CONTAINS ( lcase( str( ?p )) , "code" ) ) 
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?codeType )
    BIND ( IRI ( CONCAT ( "http://linked.data.gov.au/def/asgs/id#" , ?codeType ) ) AS ?codeDataType )
    BIND ( STRDT ( ?code , ?codeDataType ) AS ?typedCode )
}

DELETE { ?m ?p ?code .}
WHERE { 
    ?m a geo:Feature ; ?p ?code .
    FILTER ( CONTAINS ( lcase( str( ?p )) , "code" ) ) 
 }

INSERT { ?m dcterms:title ?typedName .}
WHERE {
    ?m a geo:Feature ; ?p ?name .
    FILTER ( CONTAINS ( lcase( str( ?p )) , "name" ) ) 
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?nameType )
    BIND ( IRI ( CONCAT ( "http://linked.data.gov.au/def/asgs/id#" , ?nameType ) ) AS ?nameDataType )
    BIND ( STRDT ( ?name , ?nameDataType ) AS ?typedName )
}

DELETE { ?m ?p ?name.}
WHERE { 
    ?m a geo:Feature ; ?p ?name.
    FILTER ( CONTAINS ( lcase( str( ?p )) , "name" ) ) 
 }
dr-shorthair commented 4 years ago

Item 3.:

INSERT { ?gf1 geo:sfContains ?gf2 . ?gf2 geo:sfWithin ?gf1 . }
WHERE {
    ?gf1 ?p ?gf2 .
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?pname )
    FILTER ( REGEX ( ?pname  , "^is[a-zA-Z0-9]+Of" ) ) 
}

DELETE { ?gf1 ?p ?gf2 . }
WHERE {
    ?gf1 ?p ?gf2 .
    BIND ( STRAFTER ( str( ?p ) , "http://linked.data.gov.au/def/asgs#" ) AS ?pname )
    FILTER ( REGEX ( ?pname  , "^is[a-zA-Z0-9]+Of" ) ) 
}
dr-shorthair commented 4 years ago

Item 4.:

INSERT { 
    ?f loci:isMemberOf ?r . 
    ?r a rdf:Bag , loci:Dataset ; rdfs:member ?f . 
}
WHERE { ?f reg:register ?r . }

DELETE { ?f reg:register ?r . }
WHERE { ?f reg:register ?r . }
dr-shorthair commented 4 years ago

Item 0.: (ensure that every individual feature is explicitly typed as a geo:Feature and asgs:Feature)

INSERT { ?gf a geo:Feature . }
WHERE {
    ?gf a [ rdfs:subClassOf+ geo:Feature ; ] .
}

INSERT { ?gf a asgs:Feature . }
WHERE {
    ?gf a [ rdfs:subClassOf+ asgs:Feature ; ] .
}
dr-shorthair commented 4 years ago

Item 5.:

INSERT { ?gf geo:hasGeometry ?gg . }
WHERE { 
     ?gf a asgs:Feature .  
     OPTIONAL { ?gf a asgs:StateOrTerritory . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_ste/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel4 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa4/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel3 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa3/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel2 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa2/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:StatisticalAreaLevel1 . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_sa1/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
     OPTIONAL { ?gf a asgs:MeshBlock . BIND( IRI( CONCAT( "http://gds.loci.cat/geometry/asgs16_mb/" , REPLACE( str( ?gf), '^.*(#|/)', "" ))) AS ?gg ) }
}
ashleysommer commented 4 years ago

@dr-shorthair @jyucsiro Same question as I had on the GNAF repository... Removing the ASGS-ontology predicates such as asgs:category, asgs:mbCode2016, asgs:isStatisticalAreaLevel1Of, etc, and replacing them with simplified/harmonized predicates means the 'asgs' profile/view is no longer aligned with the ASGS ontology. Should these changes be implemented in a different profile? Should we have keep the current representation as the 'asgs' profile and build a new 'loci' profile with these changes?

jyucsiro commented 4 years ago

@ashleysommer Simon made changes to the ASGS ontology that clarified the subProperty hierarchy for categories and codes. This allows backward compatibility of the current method mapping.

The simplified/harmonized predicates uses less of the asgs: (now) specialised predicates. Still aligned but the "preferred" approach just uses the parent property, rather than the specialised predicates...

Should we have keep the current representation as the 'asgs' profile and build a new 'loci' profile with these changes?

I think having both asgs and loci profiles would be a sensible approach at this time. We're still evaluating whether the asgs profile would be useful in an ongoing way or whether we just run with loci. Would we be able to have both for now?

jyucsiro commented 4 years ago

Crossreferencing https://github.com/CSIRO-enviro-informatics/asgs-dataset/issues/13#issuecomment-571406571

I can see what you mean now. I think having 2 profiles - loci and asgs would be the approach. Otherwise, the alternative is to duplicate or reason for the category/code predicates. The intention of the loci profile was to reuse predicates from well-known ontologies as much as possible.

ashleysommer commented 4 years ago

Another observation: @dr-shorthair I see you've taken the approach of turning category name into a codelist item. ie. category label "Primary Production" -> <asgs-cat/primary-production>. That may not be necessary.

In the raw ASGS data, the Meshlocks have a category_name (string) and category (integer). category_name is what we currently use for the category label in the ASGS RDF view, however the category from the raw data is an integer and is currently not used in any rdf representation. So when simplifying this representation we probably use the category integer as the identifier for the codelist item.

dr-shorthair commented 4 years ago

In general it is expected that all classifications and code-lists be published as web resources, so that (a) they can be used more broadly, and (b) the definitions obtained as-needed by dereferencing the URI.

I don't care what token is used for the local name, but the general principle is that the category is denoted by a URI.

dr-shorthair commented 4 years ago

What is the " 'asgs' profile/view " ? As @jyucsiro notes, the goal is to have a loc-i view as primary, with more refined views as minor elaborations.

ashleysommer commented 4 years ago

@dr-shorthair When I talk about "view", I am referring to choosing a profile using the content-negotiation-by-profile feature in PyLDAPI. It means a single feature can have multiple different representations, depending on the profile you choose. You can see them listed in the 'alternates view' like here.

When resolving the resource URI, if you don't explicitly choose a profile (using the "?_view=" query param) then you will get a default view, in our ASGS pyldapi deployment we have a default view called "asgs" in which the WFS feature is mapped to / aligned to the ASGS ontology. It looks like the "Original Form" snippet you have above.

dr-shorthair commented 4 years ago

The goal of the 'simplification' was to make querying across datasets easier by replacing some properties that had been created in new namespaces with properties from standard namespaces. The SPARQL queries should be more portable across datasets. I think I managed to do this without any loss of information. It did involve some additional datatypes and controlled vocabularies, but the similarities between the primary dataset structures are more obvious.

ashleysommer commented 4 years ago

@dr-shorthair I'm not disputing that, it is a good change. I'm saying in order to implement these changes, I'm introducing a 'loci' view, which will be the default profile, it will contain these changes, while leaving the 'asgs' view untouched and fully aligned with the ASGS ontology.

dr-shorthair commented 4 years ago

@ashleysommer note that "the ASGS ontology" has been modified and streamlined. It has been refactored into multiple graphs (files), with some of these tagged owl:deprecated true in particular

https://github.com/AGLDWG/asgs-ont/blob/master/asgs-path.ttl is not explicitly deprecated, but is maintained in a separate graph as its capabilities are not currently used in any data that we have access to.

ashleysommer commented 4 years ago

@dr-shorthair oh, I see what you mean now.

So the old original 'asgs' view (with the asgs:mbCode2016 and asgs:statisticalAreaLevel1Of etc) is now not needed at all in our pyldapi deployment of the ASGS dataset?

Are the ABS guys (ie, Laurent) across the ontology changes and approve of them?

I was under the impression there were people in ABS using this pyldapi implementation (either our deployment, or their own instance) and relying on the original ontology predicates.

dr-shorthair commented 4 years ago

I contacted Laurent to verify if it was OK to make changes and he concurred. I'm being pretty careful to document them well, and not to throw anything away, just mark it 'deprecated'. AFAICT we are the only people maintaining an active deployment. The plan would be to hand it over to them, but there is nothing currently depending on it.

dr-shorthair commented 4 years ago

See https://github.com/AGLDWG/asgs-ont/blob/master/images/non-abs-structures.png

jyucsiro commented 4 years ago

@ashleysommer yep - I don't see a need for an asgs view given changes that @dr-shorthair made to simplify the asgs/loci view, unless there is something I'm missing or a feature others would want.

on the abs-structures vs non-abs-structures, we don't need to tackle the non-abs-structures yet, but if it is simple to do, then it is a nice-to-have. the abs structures are a must have for our next release - these are:

dr-shorthair commented 4 years ago

(All encoded in https://github.com/AGLDWG/asgs-ont/blob/master/asgs.ttl )