Add equivalent of reg:containedItemClass to Collections

nicholascar commented 4 years ago

@dr-shorthair, @ashleysommer, @rob-metalinkage, @jyucsiro: note this changed semantics for pyLDAPI use in LocI

Due to Issue #17, I’ve recoded pyLDAPI to deliver Containers (rdf:Bag), not reg:Registers. See the GSQ FoI API Container of Orogens: https://gsq.cat/foi/orogen/.

This is essentially the same except for one thing: where the Register Ontology has a property we used to express the type of object in the register, reg:containedItemClass, rdf:Container (rdf:Bag) doesn’t.

I’d like to be able to indicate, at the container level, what the class(es) of members of the container is for it clients would like to find out if a Container contains things of interest to them without having to sample the classes of individuals within the container.

How best to model this? Should we be considering the Bag something with dimensions of member types and thus use Data Cube to express the equivalent of reg:containedItemClass, perhaps like this for the Container of Orogens:

@prefix geof: <http://linked.data.gov.au/def/sweetgeofeatures#> .

:Container_X
    a rdf:Bag ;
    qb:structure [
        a qb:DataStructureDefinition ;
        qb:componentProperty [
            a qb:DimensionProperty;
                rdfs:label “Member Classes” ;
                qb:concept geof:Orogen ; # class Orogen, could be multiple
        ] ;
    ] ;
.

It's a bit of a fudge with the Orogen class being used where QB expects a skos:Concept.

@dr-shorthair: is this useful for DCAT3? Should dcat:Datasets that contain feature members describe the types of members in this way?

ashleysommer commented 4 years ago

@nicholascar this is a pretty important question. We use reg:containedItemClass in LOC-I in a few different ways, so definitely need an equivalent.

Just thinking aloud there, I think it would be useful to have an itemCount property on the Bag too. The total number of items in a register is known to pyLDAPI in order to do the paging calculations, so exposing that number in the Bag definition seems an easy additional step.

nicholascar commented 4 years ago

@ashleysommer I think that an item count could be another QB dimension, but I'd like to work out how such a property could be a QUDT thing, sort of like the various area properties are QUDT things for geo:Feature instances.

rob-metalinkage commented 4 years ago

instead of a blank node, you need to define which property of the container is the membership predicate..

so if you used rdfs:member then it would look like:

rdfs:member a qb:CodedDimensionProperty ; rdfs:range geof:Orogen;

and you could bind it to a particular set :

qb:codelist <some concept scheme - i guess it could be self-referential if the subject was the collection which was also a Concept Scheme>

Note that this is "graph closure sensitive" - two different collections might be saying different things about the property rdfs:member.

(I would like to see an extension to QB to allow the dimension description to be decoupled from the element name - and reference the element name instead

eg:mydim qb:property rdfs:member ;

and support object notations something like: qb:xpath "rdfs:member" qb:jsonpath "member.@id"

etc.

rob-metalinkage commented 4 years ago

@nicholascar an item count wouldn't be a dimension ... its just a property of the object. QB dimensions, measures and attributes relate to the instances described by data structure. It would be a measure property for a set of collections however.

dr-shorthair commented 4 years ago

This information is effectively schema-level. So it could be expressed using OWL axioms 'at run time':

my:Bag987 a [   
    rdfs:subClassOf rdf:Bag ;
    rdfs:subClassOf [
        a owl:Restriction ;
        owl:allValuesFrom my:MClass_1 ;
        owl:onProperty rdfs:member ;
    ] ;
] ;
.

Else the equivalent in SHACL. (one of the 'features' of RDFS/OWL is that schema-level information can be expressed within the data)

This is stronger than a mere annotation to indicate the type of the contained item. But it requires four triples instead of one (one involving a blank node). But this is marginally fewer than the QB version, and doesn't require introducing another RDF namespace, and is not a fudge.

ashleysommer commented 4 years ago

@dr-shorthair Would you suggest a similar thing for indicating the item count in the bag?

my:MeshblocksBag a [   
    rdfs:subClassOf rdf:Bag ;
    rdfs:subClassOf [
        a owl:Restriction ;
        owl:allValuesFrom asgs:Meshblock ;
        owl:onProperty rdfs:member ;
    ] ;
    rdfs:subClassOf [
        a owl:Restriction ;
        owl:cardinality "358009"^xsd:nonNegativeInteger ;
        owl:onProperty rdfs:member ;
    ] ;
] ;
.

ashleysommer commented 4 years ago

Looks like this can also be done with RDFS:range too, but with more triples:

my:meshblockMember a rdfs:Property;
    rdfs:subClassOf rdfs:member;
    rdfs:range asgs:Meshblock .

my:MeshblocksBag a [   
    rdfs:subClassOf rdf:Bag ;
    rdfs:subClassOf [
        a owl:Restriction ;
        owl:cardinality "358009"^xsd:nonNegativeInteger ;
        owl:onProperty my:meshblockMember ;
    ] ;
] ;
.

dr-shorthair commented 4 years ago

Indeed. I'm trying not to add any new names (URIs) explicitly. This comes at a cost of having a few blank-nodes, which some people dislike. But it means that the SPARQL is consistent - you only need to use rdfs:member.

I'm trying to think whether this is all too cute by half. It removes the need for additional namespaces and doesn't actually blow out the triple count. However, the model and syntax is a bit circuitous.

rob-metalinkage commented 4 years ago

without making a strong statement about preference for solution, just noting that QB has a couple of extra things: 1) whether its a dimension measure or attribute in nature 2) naming the concept scheme - i.e. a binding to the source of the allowed values

for a self-contained graph with a collection and members finding the source isnt an issue - it requires impiclit (magic) knowledge that the members are local - but its not a mechanism useful for describing other aspects.

note you dont need to use QB with a full data structure definition - just declaring the component - so its fewer triples and least complicated in structure - modulo making statements about properties that are more general.

using pure RDFS might be best - it seems for every semantic asset there needs to be some contract around the rules for graph closure and reasoning that is required. With things like large vocabularies (taxons, geographic areas, people, devices) that cannot reasonably be handle by graph closure there is an addition contract required about identifier dereferencing. So IMHO I'd lay out exactly what these expectations are, then choose the encoding that best supports that.

nicholascar commented 4 years ago

it requires implicit (magic) knowledge that the members are local

For a deployment of pyLDAPI, it's true that the equivalent of :Register_X reg:containedItemClass ex:Some_Class_Y . is using local knowledge. So to solve my original question, we don't need to solve the container/member class coupling in general, only specifically for local knowledge (closed) scenarios.

So perhaps we just do two things for now:

Provide a Python class constructor argument slot for Container classes
- and use these to create:

    rdfs:subClassOf [
        a owl:Restriction ;
        owl:allValuesFrom asgs:Meshblock ;
        owl:onProperty rdfs:member ;
    ] ;

For any pyLDAPI Container.

Express member total count in Container RDF
- as Ashley suggests
- total count is indeed actually known by pyLDAPI (to support member list paging)

For Container dimensions in general: we might need a motivating scenario to solve this in general with something like QB. (Rob & i have ideas about this for DCAT profiling).

RDFLib / pyLDAPI

Add equivalent of reg:containedItemClass to Collections #23