Consider the impact of the use of SKOS in PODs' data on the indexes

FabienGandon commented 10 months ago

Some PODs may include data using SKOS schemata. This would not be considered by the current type indexes that focus on rdfs/owl:class and not skos:concept We could consider and discuss that case if needed (e.g. in cultural data scenarios where thesauri are used)

lecoqlibre commented 10 months ago

We are using SKOS vocabularies in the DFC project to reference:

product types (like vegetables, meat, fish...)
measures and units (6 pack, box, kg...)
product facets like labels (ex: Organic EU), product origin, etc.

We will need to index these for instance to find PODs where there are some tomatoes.

lecoqlibre commented 10 months ago

Based on the TypeIndex logic, could not we just add a solid:forSkosConcept property? So a solid:TypeIndexRegistration referring to a SKOS concept could be written like:

@base <https://example.pod/username/datafoodconsortium>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/productTypes#>.

<#ab09fd> a solid:TypeRegistration;
    solid:forSkosConcept dfc-pt:artichoke;
    solid:instance </catalogs/default/catalog-items/artichoke.ttl>.

(Replace solid:instance by solid:instanceContainer to target an entire container of resources).

@FabienGandon @pchampin

pchampin commented 10 months ago

Your example above does not work. Per the semantics of solid:TypeRegistration, it would mean that the following triple is true:

  </catalogs/default/catalog-items/artichoke.ttl> rdf:type dfc-pt:artichoke.
  #                                               ^^^^^^^^

which is not what you want to convey.

We need a different kind of registration here, which mean that the given skos concept is ether

mentioned anywhere in the graph of the index resource(s)
mentioned in a triple involving the index resource(s)
mentioned in a triple having a specified predicate (using for example the same `solid:forProperty as in #13) ...

By the way, does this kind of registration need to be specific to skos? We could replace "the given skos concept" above by "the given resource"...

FabienGandon commented 10 months ago

I was about to make the same remark as Pierre-Antoine. I was thinking of a more general way of indicating that a resource is mentioned something along the lines

</catalogs/default/catalog-items/artichoke.ttl> index:mentions dfc-pt:artichoke .

lecoqlibre commented 10 months ago

I agree with you @pchampin and @FabienGandon.

So what about introducing a new kind of index and a new kind of registration like in the following example?

As Fabien used it, I will use the index prefix so we will be able to distinguish additional elements.

File typeIndex.ttl:

@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-b: <https://www.datafoodconsortium.org#>.
@prefix index: <tbd>.

<>
    a solid:TypeIndex;
    a solid:ListedDocument.

<#ab09fd> a solid:TypeRegistration;
    solid:forClass index:Index;
    solid:instance <index.ttl>.

File index.ttl;

@base <https://example.pod/username/datafoodconsortium>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/productTypes#>.
@prefix index: <tbd>.

<>
    a index:Index;
    a solid:ListedDocument.

<#ab09fd> a index:MentionRegistration;
    solid:instance </catalogs/default/catalog-items/artichoke.ttl>;
    index:mentions dfc-pt:artichoke.

lecoqlibre commented 9 months ago

Consider the following index:Registration which only mentions a heirloom tomato:

@base <http://localhost:8000/lecoqlibre/datafoodconsortium/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/product-types#>.
@prefix index: <TBD>.
@prefix catalog-items: <catalogs/default/catalog#>.
@prefix : <catalogs/default/index0#>.

<catalogs/default/index0>
    a index:Index;
    a solid:ListedDocument.

:heirloom-tomato a index:Registration;
    index:mentions dfc-pt:heirloom-tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

As a heirloom tomato is a tomato which is a vegetable, the previous registration could be enriched and produce :

Form 1: leaves (SKOS concept with no children) contain all the category (parent) hierarchy.

@base <http://localhost:8000/lecoqlibre/datafoodconsortium/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/product-types#>.
@prefix index: <TBD>.
@prefix catalog-items: <catalogs/default/catalog#>.
@prefix : <catalogs/default/index0#>.

<catalogs/default/index0>
    a index:Index;
    a solid:ListedDocument.

:heirloom-tomato a index:Registration;
    index:mentions dfc-pt:vegetable, dfc-pt:tomato, dfc-pt:heirloom-tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

Form 2: registrations are restricted to only one solid:mentions.

@base <http://localhost:8000/lecoqlibre/datafoodconsortium/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/product-types#>.
@prefix index: <TBD>.
@prefix catalog-items: <catalogs/default/catalog#>.
@prefix : <catalogs/default/index0#>.

<catalogs/default/index0>
    a index:Index;
    a solid:ListedDocument.

:vegetables a index:Registration;
    index:mentions dfc-pt:vegetable;
    solid:instance 
        catalog-items:tomato-heirloom.

:tomatoes a index:Registration;
    index:mentions dfc-pt:tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

:heirloom-tomato a index:Registration;
    index:mentions dfc-pt:heirloom-tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

Form 3: registrations contain all the category (parent) hierarchy and are not restricted to only one solid:mentions.

@base <http://localhost:8000/lecoqlibre/datafoodconsortium/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/product-types#>.
@prefix index: <TBD>.
@prefix catalog-items: <catalogs/default/catalog#>.
@prefix : <catalogs/default/index0#>.

<catalogs/default/index0>
    a index:Index;
    a solid:ListedDocument.

:vegetables a index:Registration;
    index:mentions dfc-pt:vegetable;
    solid:instance 
        catalog-items:tomato-heirloom.

:tomatoes a index:Registration;
    index:mentions dfc-pt:vegetable, dfc-pt:tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

:heirloom-tomato a index:Registration;
    index:mentions dfc-pt:vegetable, dfc-pt:tomato, dfc-pt:heirloom-tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

I'm wondering if a form should be preferred over another to build a SKOS index for any reason like querying or indexing?

Action	Form 1	Form 3
When a category is queried, multiple registrations will be returned. `SELECT ?instance WHERE { ?url index:mentions dfc-pt:vegetable; solid:instance ?instance }`.	X	X
Leaves registrations (`:heirloom-tomato`) must be updated when they are added or removed from a parent category.	X	X
Querying a category will return instance duplicates. `SELECT ?instance WHERE { ?url index:mentions dfc-pt:vegetable; solid:instance ?instance }`.		X

FabienGandon commented 9 months ago

Hello,

Three reactions:

(1) I don't see the motivations for options 2 and 3, option 1 is the natural one as what will be declared by users are the most precise concepts they need/provide and the rest will be inferred. Someone provide only Heirloom Tomato will say so, while someone providing an open variety of tomatoes may use only the Tomato concept. Then an "indexer agent" or a "vocabulary enricher agent" could enrich the declaration with inference.

(2) In your code you write:

@base <http://localhost:8000/lecoqlibre/datafoodconsortium/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-pt: <https://www.datafoodconsortium.org/product-types#>.
@prefix index: <TBD>.
@prefix catalog-items: <catalogs/default/catalog#>.
@prefix : <catalogs/default/index0>.

<catalogs/default/index0>
    a index:Index;
    a solid:ListedDocument.

:heirloom-tomato a index:Registration;
    index:mentions dfc-pt:heirloom-tomato;
    solid:instance 
        catalog-items:tomato-heirloom.

The three resources :heirloom-tomato , dfc-pt:heirloom-tomato and catalog-items:tomato-heirloom have ambiguous names for me and make your example hard to read if your later want to use it as tutorial material. My understanding of what you are writing is that :heirloom-tomato should be something like :MyTomatoRegistration. The difference between dfc-pt:heirloom-tomato and catalog-items:tomato-heirloomand its motivation escape me here. Are they both SKOS concepts? If so what is the difference between them?

(3) one could imagine separating asserted registrations from inferred ones e.g. with index:implicitly-mentions links added by inferences which would help applications process them properly an in particular know what is explicitly found in the file:

:MyTomatoRegistration a index:Registration;
    index:mentions dfc-pt:heirloom-tomato;
        index:implicitly-mentions dfc-pt:vegetable, dfc-pt:tomato .

lecoqlibre commented 9 months ago

(1) I don't see the motivations for options 2 and 3, option 1 is the natural one as what will be declared by users are the most precise concepts they need/provide and the rest will be inferred. Someone provide only Heirloom Tomato will say so, while someone providing an open variety of tomatoes may use only the Tomato concept. Then an "indexer agent" or a "vocabulary enricher agent" could enrich the declaration with inference.

Yes I was asking if a representation should be preferred to save the inferred index.

The three resources :heirloom-tomato , dfc-pt:heirloom-tomato and catalog-items:tomato-heirloom have ambiguous names for me and make your example hard to read if your later want to use it as tutorial material. My understanding of what you are writing is that :heirloom-tomato should be something like :MyTomatoRegistration. The difference between dfc-pt:heirloom-tomato and catalog-items:tomato-heirloom and its motivation escape me here. Are they both SKOS concepts? If so what is the difference between them?

Yes the registration could be renamed like :MyTomatoRegistration or anything else. My example contained an error, a # was missing in the prefix : <catalogs/default/catalog>. so I changed it to prefix : <catalogs/default/catalog#>.

dfc-pt:heirloom-tomato is the value of the index:mentions property so it is a skos:Concept.
catalog-items:tomato-heirloom is the value of the solid:instance property so it is not a skos:Concept but an instance that mentions a skos:Concept.

(3) one could imagine separating asserted registrations from inferred ones e.g. with index:implicitly-mentions links added by inferences which would help applications process them properly an in particular know what is explicitly found in the file:

I agree using index:mentions to express inferred values is not very clear but do we really need to separate asserted registrations from inferred ones? Do applications need to process in a particular way inferred values?

Any thoughts @pchampin ?

FabienGandon commented 9 months ago

(3) one could imagine separating asserted registrations from inferred ones e.g. with index:implicitly-mentions links added by inferences which would help applications process them properly an in particular know what is explicitly found in the file: I agree using index:mentions to express inferred values is not very clear but do we really need to separate asserted registrations from inferred ones? Do applications need to process in a particular way inferred values? Ma question ici porte sur le fait qu'une application qui trouve index:mentions dans un index peut s'attendre à trouver explicitement ces mentions dans la source indexée alors que si elles ont été inférées elles ne seront pas trouvées dans la source: il peut être intéressant pour une application de facilement faire la différence entre TomateXYZ qui sera effectivement trouvé dans la source et Légume qui ne sera jamais trouvé dans la source si on la consulte mais qui s'y trouve implicitement si on fait les inférences.

lecoqlibre commented 9 months ago

Ma question ici porte sur le fait qu'une application qui trouve index:mentions dans un index peut s'attendre à trouver explicitement ces mentions dans la source indexée alors que si elles ont été inférées elles ne seront pas trouvées dans la source: il peut être intéressant pour une application de facilement faire la différence entre TomateXYZ qui sera effectivement trouvé dans la source et Légume qui ne sera jamais trouvé dans la source si on la consulte mais qui s'y trouve implicitement si on fait les inférences.

Yes I got that @FabienGandon, I was asking for an example of a use case where an app want to make the difference between asserted vs inferred values.

FabienGandon commented 9 months ago

Yes I got that [ https://github.com/FabienGandon | @FabienGandon ] , I was asking for an example of a use case where an app want to make the difference between asserted vs inferred values. Take these cases:

a search engine looking for Vegetables: if you select a source where vegetable is inferred you know you will have to perform inference on the source first otherwise you will not find them directly.

a browsing application: the link to Vegetable is implicit and you will display zero Vegetable if it is only inferred

for statistics on vocabulary usage: you can make the difference between concepts used and conpets only inferred

more generally it bothers me to say that XYZ is mentioned when in fact it is not.

lecoqlibre commented 9 months ago

option 1 is the natural one as what will be declared by users are the most precise concepts they need/provide and the rest will be inferred

It is not obvious for me as an app can do the inference in place of the user. Indeed, the user creates a new product using the app and selects the "heirloom tomato" product type. When the app will save the resource on the POD, it can add inferred product types like "vegetable" and "tomato" directly in the resource and/or in some indexes.

a search engine looking for Vegetables: if you select a source where vegetable is inferred you know you will have to perform inference on the source first otherwise you will not find them directly.

Won't the search engine do inference anyway? To be sure to find everything.

a browsing application: the link to Vegetable is implicit and you will display zero Vegetable if it is only inferred

A Solid app is supposed to comply with the client-to-client standard so it will use the indexes defined in the c2cs and find vegetables. For non complying apps, are not them supposed to do inference by themselves?

for statistics on vocabulary usage: you can make the difference between concepts used and conpets only inferred

Is this really useful?

more generally it bothers me to say that XYZ is mentioned when in fact it is not.

Yes I agree but a larger term could also be used instead to express both inferred and non-inferred resources like linkedTo for instance.

FabienGandon commented 9 months ago

BQ_BEGIN

option 1 is the natural one as what will be declared by users are the most precise concepts they need/provide and the rest will be inferred

It is not obvious for me as an app can do the inference in place of the user. Indeed, the user creates a new product using the app and selects the "heirloom tomato" product type. When the app will save the resource on the POD, it can add inferred product types like "vegetable" and "tomato" directly in the resource and/or in some indexes. BQ_END

I am not sure why this contradicts what I said: " what will be declared by users are the most precise concepts (...) the rest will be inferred " ; where I could desagree is that it is not necessarilly the user app that will perform and materialise inferences, it can be done by different agents, using or specialized in different vocabularies/ontologies and in many different ways.

BQ_BEGIN

* a search engine looking for Vegetables: if you select a source where vegetable is inferred you know you will have to perform inference on the source first otherwise you will not find them directly.

BQ_END

Won't the search engine do inference anyway? To be sure to find everything. BQ_END Not all search engines do and will, only the one that have a reasoning engine and can afford the cost and also only the ones that know the ontology/vocabulary that is used.

BQ_BEGIN

* a browsing application: the link to Vegetable is implicit and you will display zero Vegetable if it is only inferred

BQ_END

A Solid app is supposed to comply with the client-to-client standard so it will use the indexes defined in the c2cs and find vegetables. For non complying apps, are not them supposed to do inference by themselves? BQ_END I don't see the link with c3cs here in the sense that if an index says that pod121 has Vegetables and I go there and find only Letuces I have to be able to infer that Letuces are Vegetables to close the loop even if I am a very lightweight javascript application I have the burden of implementing an inference engine which is not a small thing.

BQ_BEGIN

* for statistics on vocabulary usage: you can make the difference between concepts used and conpets only inferred

BQ_END

Is this really useful? BQ_END I don't know :) is it useless? can we know at that stage? and more generally which option is the more inclusive / supports more usages? any application that can afford inferences will do, but what about the other ones? should we support the possibility of enrichng for them?

BQ_BEGIN

BQ_END

Wimmics / solid-start

Consider the impact of the use of SKOS in PODs' data on the indexes #11