Need to constrain iderdown queries to appropriate dataset

shaneseaton commented 5 years ago

Currently the queries are not constrained to the specified dataset. This can result in cross dataset issues, for example GNAF and GNAF16 use the same uris for base types, and thus if both datasets are in the cache, things could get confused.

dr-shorthair commented 5 years ago

Each "dataset" is considered to be a reg:Register, so that each "entity" (address, SLA, catchment etc) is associated with a dataset using the reg:register predicate. e.g.

<http://linked.data.gov.au/dataset/geofabric/contractedcatchment/12105364> rdf:type geofabric:ContractedCatchment ;
    reg:register <http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> .

So each entity is a member

of a class such as <http://linked.data.gov.au/def/geofabric#ContractedCatchment> and
of a dataset such as http://linked.data.gov.au/dataset/geofabric/contractedcatchment/ .

These are different things.

Furthermore, lower level datasets are also a member of a higher level register, e.g.

<http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> rdf:type reg:Register ;
    reg:register <http://linked.data.gov.au/dataset/geofabric> .

so if you want to constrain the query to the higher level datasets, then the query will have to include a property-path reg:register+

dr-shorthair commented 5 years ago

Looking at what datasets are present:

PREFIX reg: <http://purl.org/linked-data/registry#>
select * where { 
   ?d a reg:Register .
}

which results in

1 | http://linked.data.gov.au/dataset/geofabric/contractedcatchment/
2 | http://linked.data.gov.au/dataset/geofabric
3 | http://linked.data.gov.au/dataset/geofabric/drainagedivision/
4 | http://linked.data.gov.au/dataset/geofabric/riverregion/
5 | http://linked.data.gov.au/dataset/asgs2016/meshblock/
6 | http://linked.data.gov.au/dataset/asgs2016/stateorterritory/
7 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/
8 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel2/
9 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel3/
10 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel4/
11 | http://linked.data.gov.au/dataset/gnaf/address/
12 | http://linked.data.gov.au/dataset/gnaf/reg/
13 | http://linked.data.gov.au/dataset/gnaf/addressSite/
14 | http://linked.data.gov.au/dataset/gnaf/locality/
15 | http://linked.data.gov.au/dataset/gnaf/streetLocality/
16 | http://linked.data.gov.au/dataset/asgs2016/australia/
17 | http://linked.data.gov.au/dataset/asgs2016/reg/

i.e. the GNAF datasets are not dated.

dr-shorthair commented 5 years ago

Other examples:

find entities that are members of datasets that are subsets of a higher-level dataset:

PREFIX reg: <http://purl.org/linked-data/registry#>
select * where { 
?s reg:register+ <http://linked.data.gov.au/dataset/geofabric> .
} limit 100

find entities that are members of classes that are sub-classes of a more general class:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select * where { 
?d a ?c .
?c rdfs:subClassOf <http://linked.data.gov.au/def/geofabric#ReportingRegion> . 
} limit 100

shaneseaton commented 5 years ago

OK. Totally agree this is the way to do, thanks for confirming it @dr-shorthair. The big task before we can do this of course, is getting all the datasets to conform to this approach... I will get into it.

shaneseaton commented 5 years ago

I though I would address this when looking into a refactor of the LDAPI's. Unfortunately the refactor didn't get far enough to replace all the LDAPI's so it isn't the solution I was hoping for. Just flagging this stuff is still an issue.

Mainly, it's an issue because of the inconsistent was registers are registered within each other. Each dataset has a different way of doing it, we need to make it consistent for a tool to be able to navigate it sensibly.

dr-shorthair commented 5 years ago

Partly related: The problem Nick was trying to resolve was the absence of a standard property that is the inverse of rdfs:member. However, use of the predicate reg:register entails that the subject is a reg:RegisterItem and the object is a reg:Register which is a little weird. 'Register' comes from the notion of 'registration' - i.e. submitting an item to be added a list, and if it meets the acceptance criteria getting issued a register-ID for it as evidence of having met the criteria. The definition of RegisterItem is "A metadata record for an entry in a register. " which is not what we are managing here (see http://purl.org/linked-data/registry#). I'm all for using an existing class/predicate in preference to just making up a new one, but it looks like there is an overshoot here.

At the end of the day what I think we need is (i) a class for datasets - this is loci:Dataset - I don't think we need reg:Register anywhere (ii) a membership predicate - this could be rdfs:member - `reg:register is just wrong (iii) rules or constraints to say that

a Dataset can have either another Dataset or a Feature as members
Datasets and Features can be members of more than one Dataset we don't need ereg:superregister etc. I think we can do all the knitting we need with simple SPARQL.

Just discussed this f2f with Jonno, and will attempt to rationalize all this in https://github.com/CSIRO-enviro-informatics/loci.cat/wiki/Rules-for-Loc-I-datasets

Note: Strictly class or set-membership in RDF is handled by rdf:type, but that would require that the containers be defined as classes, e.g.

ASGS-2016 rdfs:subClassOf loci:Dataset .

rather than as individuals, like

AGSG-2016 rdf:type loci:Dataset .

which is the way we have done it and is failry conventional. Meta-modelling often ends up with axle-wrapping ...

jyucsiro commented 4 years ago

@dr-shorthair on "I don't think we need reg:Register anywhere"

The issue is that this is baked into the pyldapi library (see https://github.com/RDFLib/pyLDAPI/blob/master/pyldapi/register_renderer.py#L224)

For this to change, we'd need to change that bit of code which renders items as reg:Register.

dr-shorthair commented 4 years ago

Hmm. Well that indicates a modelling error in pyldapi IMHO. The Register ontology is clear - register items are metadata records, not data items.

jyucsiro commented 4 years ago

Is there an alternative to the Register ontology that can be proposed? It would need to be generic (like not just Dataset items)

dr-shorthair commented 4 years ago

Yeah - that is the issue.

As discussed the other day, I think the membership predicate is easy - rdfs:member - though it would require the query pattern to be reversed.

For the container I think the options are dcat:Dataset, void:Dataset, or loci:Dataset.

loci:Dataset is project specific
void:Dataset is strictly 'A set of RDF triples that are published, maintained or aggregated by a single provider', and 'triples' are not really the same as 'Features'
dcat:Dataset leans the other direction - it may be a collection of discrete items, but often is not

But I think I'd be inclined to go with dcat:Dataset and rdfs:member unless and until we come up with anything better.

jyucsiro commented 4 years ago

For Loc-I, probably dcat:Dataset would be fine. However, the pyldapi library's scope is more general than that I believe. might be good to push some requirements to that library from loci.

CSIRO-enviro-informatics / loci-excelerator

Need to constrain iderdown queries to appropriate dataset #21