Open dr-shorthair opened 4 years ago
See
Excel spreadsheet, with additional sheet with Formulae to convert sheet 1 to 'turtle' https://github.com/GeoscienceAustralia/strat-ontology-graph-API/blob/master/strat_graph_data/STRAT_RELATIONS_extended_nogeom_ages_20200611.xlsx
RDF data, cleaned with subsequent SPARQL operations as explained in Ontology comment https://github.com/GeoscienceAustralia/strat-ontology-graph-API/blob/master/strat_graph_data/STRAT_RELATIONS_extended_nogeom_ages_20200611.ttl
For review @wjfrancis and @jyucsiro
@dr-shorthair Hi Simon, thanks for sharing the files. However, I'm getting a 404 on the xlsx. Perhaps erased with a pull request? If you could post again that would be much appreciated.
i moved it to https://github.com/GeoscienceAustralia/strat-ontology-graph-API/tree/master/source_data
/strat_graph_data
should only contain RDF data (otherwise graphDB complains)...
@jyucsiro Thanks!
@dr-shorthair @wjfrancis been looking at exploring the relationship in the output TTL file. it seems like the geometry URI included in the output TTL file uses the ID for stratnames.
e.g. STRATNAME | STRATNO | ORIGINATOR | LITHOLOGY_GROUP | STRAT_AGE_EVENT | TOPSTRAT_AGE_EVENT | MINAGENAME | MAXAGENAME | PROVINCE_ENO | RELATED STRATNAME | RELATED_STRATNO | RELATION_TYPE |
---|---|---|---|---|---|---|---|---|---|---|---|
Lady Loretta Formation | 10002 | R.J.Cavaney | carbonate sedimentary rock | deposition | deposition | Statherian | Statherian | 556523 | Shady Bore Quartzite | 16787 | overlies |
is converted into...
stratnames:u10002
a strat:Unit ;
strat:bottomAge isc:Statherian ;
strat:formationProcess proc:deposition ;
strat:lithology lith:carbonate_sedimentary_rock ;
strat:qualifiedRelation stratnames:u10002overliesu16787 ;
strat:qualifiedRelation stratnames:u10002overliesu24472 ;
strat:qualifiedRelation stratnames:u10002underliesu6282 ;
strat:relation stratnames:p467256 ;
strat:relation stratnames:p556523 ;
strat:relation stratnames:p559187 ;
strat:relation stratnames:p559201 ;
strat:terminationProcess proc:deposition ;
strat:topAge isc:Statherian ;
strat:wasDefinedBy <http://pid.geoscience.gov.au/agent/RJCavaney> ;
dcterms:identifier "10002" ;
dcterms:publisher <https://ror.org/04ge02x20> ;
geo:hasGeometry <https://gds.loci.cat/geometry/stratnames/u10002> ;
rdfs:label "Lady Loretta Formation" ;
.
Should we use the PROVINCE_ENO
value as the basis for the Geometry URI ?
@jyucsiro @dr-shorthair Yes, that's right, so in this case https://gds.loci.cat/geometry/stratnames/u10002 could be replaced with https://gds.loci.cat/geometry/province/556523
ok - i'm not sure how the ttl was converted. @dr-shorthair would you be able to run that conversion again with PROVINCE_ENO value as the basis for the Geometry URI ?
@jyucsiro @dr-shorthair the property hasGeometry is a fudge here though. The property would be more precisely to the the province entity and then the province would have the hasGeometry property (maybe something to address at a later stage).
@jyucsiro I could do this I think as a test run through of @dr-shorthair 's xlsx to rdf template
@wjfrancis the sparql queries applied were:
replaced agents/blank-nodes with named resources using
INSERT { ?pid a prov:Agent ; rdfs:label ?n . ?c strat:wasDefinedBy ?pid . }
WHERE {
?c a strat:Unit ;
strat:wasDefinedBy ?b .
?b rdfs:label ?n .
BIND( REPLACE( ?n , \"[., ()&]\", \"\") AS ?id )
BIND( IRI(CONCAT( \"http://pid.geoscience.gov.au/agent/\", ?id)) AS ?pid )
}
DELETE { ?c strat:wasDefinedBy ?b . }
WHERE {
?c a strat:Unit ;
strat:wasDefinedBy ?b .
FILTER isblank(?b)
}
Convert relations into non-blank-nodes
INSERT { ?u1u2 a strat:Relationship ;
strat:relation ?u2 ;
strat:relationshipType ?t .
?u1 strat:qualifiedRelation ?u1u2 . }
WHERE {
?u1 a strat:Unit ;
strat:qualifiedRelation ?b .
?b strat:relation ?u2 ;
strat:relationshipType ?t .
?u1 dcterms:identifier ?i1 .
?u2 dcterms:identifier ?i2 .
BIND( STRAFTER( STR(?t) , \"http://pid.geoscience.gov.au/def/stratname/relation/\" ) AS ?t12 )
BIND( IRI(CONCAT( \"http://pid.geoscience.gov.au/dataset/stratnames/\", \"u\" , ?i1 , ?t12 , \"u\" , ?i2 ) ) AS ?u1u2 )
}
DELETE { ?b ?p ?o . }
WHERE {
?b a strat:Relationship ;
?p ?o .
FILTER isblank (?b)
}
I consolidated all the rows relating to each unit
, so you see there are multiple relations
(to provinces
) and qualified-relations
(to other units
) in the descriptions. This means that each unit
is related to multiple provinces
.
OTOH each qualified-relation
is linked to a single province
, so perhaps that is the first-class item in the dataset, rather than the unit
?
And yes, it looks like the Province
needs to be added as a separate first-class object, which carries the geometry rather than the Unit
. That one would be is easy ...
I had not understood that the only geometries that we had were associated with the provinces. ;-)
Can you confirm what the nature of the relationship between Unit
and Province
and Relationship
is?
looks like
Re-process data as follows:
DELETE { ?u geo:hasGeometry ?g }
WHERE {
?u a strat:Unit; geo:hasGeometry ?g
}
INSERT { ?r strat:context ?c }
WHERE {
?c strat:qualifiedRelation ?r
}
INSERT { ?p a strat:Province ; dcterms:identifier ?pi ; geo:hasGeometry ?g . }
WHERE {
?u a strat:Unit ;
strat:relation ?p .
BIND( STRAFTER( STR(?p) , "http://pid.geoscience.gov.au/dataset/stratnames/p") AS ?pi )
BIND( IRI( CONCAT( "https://gds.loci.cat/geometry/province/p" , ?pi ) ) AS ?g )
}
But questions above still need answers.
Thanks @dr-shorthair. For this subset of data we have
Unit -> Province 1 to many Unit -> Relation 1 to many and the inverse No direct relationship between Relation and Province
In the broader set of province records in the GA database the Province to Unit will be Many to 0. However we can focus on this subset (many to 1) as its closer to the future data model we'll use later.
@dr-shorthair @jyucsiro Will need some further assistance with completing the RDF conversion - currently can follow the xlsx formulas and post processing is clear - however in the initial load into triple store I'm missing something
Attempt at conversion looks a bit like this
@prefix data: http://linked.data.gov.au/def/datatype/ . @prefix dcterms: http://purl.org/dc/terms/ . @prefix geo: http://www.opengis.net/ont/geosparql# . @prefix isc: http://resource.geosciml.org/classifier/ics/ischart/ . @prefix lith: http://pid.geoscience.ga.gov.au/def/stratname/lithology/ . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix proc: http://pid.geoscience.ga.gov.au/def/stratname/process/ . @prefix prov: http://www.w3.org/ns/prov# . @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix rel: http://pid.geoscience.ga.gov.au/def/stratname/relation/ . @prefix skos: http://www.w3.org/2004/02/skos/core# . @prefix strat: http://pid.geoscience.ga.gov.au/def/stratname# . @prefix stratnames: http://pid.geoscience.ga.gov.au/dataset/stratnames/ . @prefix time: http://www.w3.org/2006/time# . @prefix xsd: http://www.w3.org/2001/XMLSchema# .
stratnames:u27732 a strat:Unit ; rdfs:label "Campaspe Formation" ; strat:wasDefinedBy [ rdfs:label "" ; ] ; strat:lithology lith:medium-coarse_grained_siliciclastic_rock ; strat:formationProcess proc:deposition ; strat:terminationProcess proc:deposition ; strat:topAge isc:Pleistocene ; strat:bottomAge isc:Pliocene ; strat:relation stratnames:p20430 ; strat:qualifiedRelation [ rdf:type strat:Relationship ; strat:relation stratnames:u24044 ; strat:relationshipType rel:underlies ; ] ; geo:hasGeometry https://gds.loci.cat/geometry/province/u20430 ; dcterms:identifier "27732" ; dcterms:publisher https://ror.org/04ge02x20 ; . stratnames:u27732 a strat:Unit ; rdfs:label "Campaspe Formation" ; strat:wasDefinedBy [ rdfs:label "" ; ] ; strat:lithology lith:medium-coarse_grained_siliciclastic_rock ; strat:formationProcess proc:deposition ; strat:terminationProcess proc:deposition ; strat:topAge isc:Pleistocene ; strat:bottomAge isc:Pliocene ; strat:relation stratnames:p20477 ; strat:qualifiedRelation [ rdf:type strat:Relationship ; strat:relation stratnames:u24044 ; strat:relationshipType rel:underlies ; ] ; geo:hasGeometry https://gds.loci.cat/geometry/province/u20477 ; dcterms:identifier "27732" ; dcterms:publisher https://ror.org/04ge02x20 ; . stratnames:u27732 a strat:Unit ; rdfs:label "Campaspe Formation" ; strat:wasDefinedBy [ rdfs:label "" ; ] ; strat:lithology lith:medium-coarse_grained_siliciclastic_rock ; strat:formationProcess proc:deposition ; strat:terminationProcess proc:deposition ; strat:topAge isc:Pleistocene ; strat:bottomAge isc:Pliocene ; strat:relation stratnames:p408346 ; strat:qualifiedRelation [ rdf:type strat:Relationship ; strat:relation stratnames:u24044 ; strat:relationshipType rel:underlies ; ] ; geo:hasGeometry https://gds.loci.cat/geometry/province/u408346 ; dcterms:identifier "27732" ; dcterms:publisher https://ror.org/04ge02x20 ; . stratnames:u13235 a strat:Unit ; rdfs:label "Mount Windsor Volcanics" ; strat:wasDefinedBy [ rdfs:label "" ; ] ; strat:lithology lith:acid_volcanic_rock ; strat:formationProcess proc:deposition ; strat:terminationProcess proc:deposition ; strat:topAge isc:EarlyOrdovician ; strat:bottomAge isc:LateCambrian ; strat:relation stratnames:p559188 ; strat:qualifiedRelation [ rdf:type strat:Relationship ; strat:relation stratnames:u24044 ; strat:relationshipType rel:overlies ; ] ; geo:hasGeometry https://gds.loci.cat/geometry/province/u559188 ; dcterms:identifier "13235" ; dcterms:publisher https://ror.org/04ge02x20 ; . stratnames:u13235 a strat:Unit ; rdfs:label "Mount Windsor Volcanics" ; strat:wasDefinedBy [ rdfs:label "" ; ] ; strat:lithology lith:acid_volcanic_rock ; strat:formationProcess proc:deposition ; strat:terminationProcess proc:deposition ; strat:topAge isc:EarlyOrdovician ; strat:bottomAge isc:LateCambrian ; strat:relation stratnames:p20497 ; strat:qualifiedRelation [ rdf:type strat:Relationship ; strat:relation stratnames:u24044 ; strat:relationshipType rel:overlies ; ] ; geo:hasGeometry https://gds.loci.cat/geometry/province/u20497 ; dcterms:identifier "13235" ; dcterms:publisher https://ror.org/04ge02x20 ; . stratnames:u23944 a strat:Unit ; rdfs:label "Rollston Range Formation" ; strat:wasDefinedBy [ rdfs:label "R.A.Henderson" ; ] ; strat:lithology lith:siliciclastic_sedimentary_rock ; strat:formationProcess proc:deposition ; strat:terminationProcess proc:deposition ; strat:topAge isc:EarlyOrdovician ; strat:bottomAge isc:LateCambrian ; strat:relation stratnames:p559188 ; strat:qualifiedRelation [ rdf:type strat:Relationship ; strat:relation stratnames:u24044 ; strat:relationshipType rel:underlies ; ] ; geo:hasGeometry https://gds.loci.cat/geometry/province/u559188 ; dcterms:identifier "23944" ; dcterms:publisher https://ror.org/04ge02x20 ; .
(This might be worth a twirl - https://tarql.github.io/ )
Are you getting a error message?
@dr-shorthair Thanks! TARQL looks interesting.
I had a few syntax errors and needed to get rid of some additional " etc that excel put in place. I think I've got it sorted now though.
@jyucsiro Updated strat relations ttl (exported from GraphDB post processed with SPARQL queries) available here: https://transfer-temp-scidata.s3-ap-southeast-2.amazonaws.com/STRAT_RELATIONS_POST_PROCEESSED_20200626.ttl
Let me know if that looks right. (It passed the ages sample SPARQL sent through by Simon previously)
@wjfrancis it doesn't look like your converting process worked. For example, the strat unit properties aren't listed amoung other things. Will use what's in the repository for now.
I did the conversion yesterday https://github.com/GeoscienceAustralia/strat-ontology-graph-API/issues/6#issuecomment-649321348 The revised TTL is in the repo - see https://github.com/GeoscienceAustralia/strat-ontology-graph-API/blob/master/strat_graph_data/STRAT_RELATIONS_extended_nogeom_ages_20200611.ttl Also a minor update to the ontology https://github.com/GeoscienceAustralia/strat-ontology-graph-API/blob/master/rdf/stratname.ttl
@dr-shorthair your ttl file works well. we've been using in tests. i had to also refactor some bits of the REST API but all good now.
Document the conversion of the Excel spreadsheet data to RDF
ping @wjfrancis