Open shaneseaton opened 5 years ago
Each "dataset" is considered to be a reg:Register
, so that each "entity" (address, SLA, catchment etc) is associated with a dataset using the reg:register
predicate. e.g.
<http://linked.data.gov.au/dataset/geofabric/contractedcatchment/12105364> rdf:type geofabric:ContractedCatchment ;
reg:register <http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> .
So each entity is a member
<http://linked.data.gov.au/def/geofabric#ContractedCatchment>
and http://linked.data.gov.au/dataset/geofabric/contractedcatchment/
. These are different things.
Furthermore, lower level datasets are also a member of a higher level register, e.g.
<http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> rdf:type reg:Register ;
reg:register <http://linked.data.gov.au/dataset/geofabric> .
so if you want to constrain the query to the higher level datasets, then the query will have to include a property-path reg:register+
Looking at what datasets are present:
PREFIX reg: <http://purl.org/linked-data/registry#>
select * where {
?d a reg:Register .
}
which results in
1 | http://linked.data.gov.au/dataset/geofabric/contractedcatchment/
2 | http://linked.data.gov.au/dataset/geofabric
3 | http://linked.data.gov.au/dataset/geofabric/drainagedivision/
4 | http://linked.data.gov.au/dataset/geofabric/riverregion/
5 | http://linked.data.gov.au/dataset/asgs2016/meshblock/
6 | http://linked.data.gov.au/dataset/asgs2016/stateorterritory/
7 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/
8 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel2/
9 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel3/
10 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel4/
11 | http://linked.data.gov.au/dataset/gnaf/address/
12 | http://linked.data.gov.au/dataset/gnaf/reg/
13 | http://linked.data.gov.au/dataset/gnaf/addressSite/
14 | http://linked.data.gov.au/dataset/gnaf/locality/
15 | http://linked.data.gov.au/dataset/gnaf/streetLocality/
16 | http://linked.data.gov.au/dataset/asgs2016/australia/
17 | http://linked.data.gov.au/dataset/asgs2016/reg/
i.e. the GNAF datasets are not dated.
Other examples:
find entities that are members of datasets that are subsets of a higher-level dataset:
PREFIX reg: <http://purl.org/linked-data/registry#>
select * where {
?s reg:register+ <http://linked.data.gov.au/dataset/geofabric> .
} limit 100
find entities that are members of classes that are sub-classes of a more general class:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select * where {
?d a ?c .
?c rdfs:subClassOf <http://linked.data.gov.au/def/geofabric#ReportingRegion> .
} limit 100
OK. Totally agree this is the way to do, thanks for confirming it @dr-shorthair. The big task before we can do this of course, is getting all the datasets to conform to this approach... I will get into it.
I though I would address this when looking into a refactor of the LDAPI's. Unfortunately the refactor didn't get far enough to replace all the LDAPI's so it isn't the solution I was hoping for. Just flagging this stuff is still an issue.
Mainly, it's an issue because of the inconsistent was register
s are registered within each other. Each dataset has a different way of doing it, we need to make it consistent for a tool to be able to navigate it sensibly.
Partly related: The problem Nick was trying to resolve was the absence of a standard property that is the inverse of rdfs:member
. However, use of the predicate reg:register
entails that the subject is a reg:RegisterItem
and the object is a reg:Register
which is a little weird. 'Register' comes from the notion of 'registration' - i.e. submitting an item to be added a list, and if it meets the acceptance criteria getting issued a register-ID for it as evidence of having met the criteria. The definition of RegisterItem
is "A metadata record for an entry in a register. " which is not what we are managing here (see http://purl.org/linked-data/registry#). I'm all for using an existing class/predicate in preference to just making up a new one, but it looks like there is an overshoot here.
At the end of the day what I think we need is
(i) a class for datasets - this is loci:Dataset
- I don't think we need reg:Register
anywhere
(ii) a membership predicate - this could be rdfs:member
- `reg:register is just wrong
(iii) rules or constraints to say that
ereg:superregister
etc. I think we can do all the knitting we need with simple SPARQL. Just discussed this f2f with Jonno, and will attempt to rationalize all this in https://github.com/CSIRO-enviro-informatics/loci.cat/wiki/Rules-for-Loc-I-datasets
Note:
Strictly class or set-membership in RDF is handled by rdf:type
, but that would require that the containers be defined as classes, e.g.
ASGS-2016 rdfs:subClassOf loci:Dataset .
rather than as individuals, like
AGSG-2016 rdf:type loci:Dataset .
which is the way we have done it and is failry conventional. Meta-modelling often ends up with axle-wrapping ...
@dr-shorthair on "I don't think we need reg:Register anywhere"
The issue is that this is baked into the pyldapi library (see https://github.com/RDFLib/pyLDAPI/blob/master/pyldapi/register_renderer.py#L224)
For this to change, we'd need to change that bit of code which renders items as reg:Register
.
Hmm. Well that indicates a modelling error in pyldapi IMHO. The Register ontology is clear - register items are metadata records, not data items.
Is there an alternative to the Register ontology that can be proposed? It would need to be generic (like not just Dataset items)
Yeah - that is the issue.
As discussed the other day, I think the membership predicate is easy - rdfs:member
- though it would require the query pattern to be reversed.
For the container I think the options are dcat:Dataset
, void:Dataset
, or loci:Dataset
.
loci:Dataset
is project specific void:Dataset
is strictly 'A set of RDF triples that are published, maintained or aggregated by a single provider', and 'triples' are not really the same as 'Features' dcat:Dataset
leans the other direction - it may be a collection of discrete items, but often is notBut I think I'd be inclined to go with dcat:Dataset
and rdfs:member
unless and until we come up with anything better.
For Loc-I, probably dcat:Dataset would be fine. However, the pyldapi library's scope is more general than that I believe. might be good to push some requirements to that library from loci.
Currently the queries are not constrained to the specified dataset. This can result in cross dataset issues, for example GNAF and GNAF16 use the same uris for base types, and thus if both datasets are in the cache, things could get confused.