Wimmics / solid-start

Projet SOLID Inria - Startin'blox
MIT License
1 stars 0 forks source link

Implement indexes discovery for skills and cities #30

Open lecoqlibre opened 6 months ago

lecoqlibre commented 6 months ago

Implement new strategies that use an entry point index to discover the indexes.

Ex:

<> a ex:Index.

<#1> a ex:SkillIndexRegistration;
  ex:forSkill ex:skill12;
  ex:instance </path/to/index>.

<#2> a ex:CityIndexRegistration;
  ex:forCity ex:paris;
  ex:instance <path/to/index>.

The strategies will query this kind of index to discover the link to specific indexes (like skill and city indexes).

pchampin commented 6 months ago

NB: this could be made even more generic:

<> a ex:Index.

<#1> a ex:PropertyIndexRegistration;
  ex:forProperty ex:hasSkil ;
  ex:forValue ex:skill12;
  ex:instance </path/to/index>.

<#2> a ex:PropertyIndexRegistration;
  ex:forProperty ex:location ;
  ex:forValue ex:paris;
  ex:instance <path/to/index>.
pchampin commented 6 months ago

A few more ideas:

  1. In addition to ex:instance, index entries could use an ex:instancesIn property for index entries, which would not point to the instance IRI directly, but to an RDF resources containing one (or several) matching instances.

  2. rdfs:seeAlso could be used (as it is already in WebID profiles) to suggest that more information about the subject could be found in an additional resource. Useful for delocalize long list of instances in a separate resource.

  3. You could then use the same vocabulary for indexing indexes, and indexing users:

First level

rootIndex.ttl:

<> a ex:Index. # Indexing indexes by their property

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ex:hasSkill;
  ex:instancesIn <skillIndex.ttl>.

<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ex:location;
  ex:instancesIn <cityIndex.ttl>.

This way, you don't need to load the index entries related to cities when you are only interested in skill indexes.

Second level

skillIndex.ttl:

<> a ex:Index. # Indexing users by their skills

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ex:hasSkill;
  ex:forValue ex:skill1;
  rfds:seeAlso <skill1.ttl>. # because the list of instances may be big

<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:hasSkill;
  ex:forValue ex:skill2;
  rfds:seeAlso <skill2.ttl>. # because the list of instances may be big
# ...

cityIndex.ttl:

<> a ex:Index. # Indexing users by their city

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ex:location;
  ex:forValue ex:toulouse;
  rfds:seeAlso <skill1.ttl>. # because the list of instances may be big

<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:location;
  ex:forValue ex:lyon;
  rfds:seeAlso <skill1.ttl>. # because the list of instances may be big
# ...

Third level

skill1.ttl:

# additional triples about the <skillIndex.ttl#1> entry defined in skillIndex.ttl
<skillIndex.ttl#1> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8002/users/user2#me>,
    #...

skill2.ttl:

# additional triples about the <skillIndex.ttl#1> entry defined in skillIndex.ttl
<skillIndex.ttl#2> ex:instance
    <https://localhost:8002/users/user2#me>,
    <https://localhost:8003/users/user3#me>,
    #...

lyon.ttl:

# additional triples about the <cityIndex.ttl#1> entry defined in cityIndex.ttl
<cityIndex.ttl#1> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8003/users/user3#me>,
    #...

etc...

NB: the third level is not strictly required. ex:instance properties could be included directly in the 2nd level (especially in entries that have only a few values).

lecoqlibre commented 5 months ago

From PA, the SPARQL request, using named graphs, to query according to the proposal:

PREFIX ex: <http://example.org#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/#>

SELECT DISTINCT ?user ?firstName ?lastName ?city ?skills WHERE {
  [] ex:forProperty ex:forProperty ;
      ex:forValue ex:hasSkill ;
      rdfs:seeAlso ?skillIndex.

  GRAPH ?skillIndex {
    ?entry ex:forProperty ex:hasSkill ;
      ex:forValue "${skill}" ;
    rdfs:seeAlso ?skillSubIndex.
  }

  GRAPH ?skillSubIndex {
    ?entry ex:instanceIn ?user.
  }

BIND ( ... ?user ... AS ?userProfile) # remove the trainling fragment

  GRAPH ?userProfile {
    ?user foaf:givenName ?givenName ;
        foaf:familyName ?familyName ;
        ex:city ?city ;
        ex:skill ?skill.
  }
}
balessan commented 4 months ago

So I forked our Skill package to add the generation of an index whenever you save a skill information, and the content of the Index I generate is looking as follows:

@prefix ns1: <http://cdn.startinblox.com/owl/ttl/vocab.ttl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://localhost:8000/skills/1/> a <hd:skill> ;
    rdfs:label "" ;
    ns1:hasSkill <http://localhost:8000/users/balessan/>,
        <http://localhost:8000/users/balessanvir/>,
        <http://localhost:8000/users/benito/> .

<http://localhost:8000/skills/2/> a <hd:skill> ;
    rdfs:label "" ;
    ns1:hasSkill <http://localhost:8000/users/balessan/>,
        <http://localhost:8000/users/benito/> .

What should I change @lecoqlibre ?

For this first version it means that every instance on which the Skill package is installed will provide both an indexes/skills.ttl and an indexes/skills.jsonld files which is updated every time a Skill is saved.

Work branch is here: https://git.startinblox.com/djangoldp-packages/djangoldp-skill/-/merge_requests/19

Thinking about it, it costs nothing to specifically use TTL for the index management if it is more performant. RDFLib does the parsing job in an easy way.

balessan commented 4 months ago

I did the same for our user profile package so I now have an TTL draft index generated on any profile save action, which looks as follows:

@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ns1:berlin a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/benito/> .

ns1:chambéry a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/balessanvir/> .

ns1:evian a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/admin/> .

<https://cdn.startinblox.com/owl/ttl/vocab.ttl#indexes/cities> a ns1:Index ;
    rdfs:comment "Indexing users by their city" .

ns1:paris a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/balessan/>,
        <http://localhost:8000/users/balessanter/> .

ns1:venise a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/benito2/> .

I arbitrary decided to format to lowercase the city name, as this is only a literal in our user profile. Unsure if that index files makes sense though.

Draft MR available here: https://git.startinblox.com/djangoldp-packages/djangoldp-profile/-/merge_requests/22

Ping @lecoqlibre

lecoqlibre commented 4 months ago

Good news @balessan.

So you generated distributed indexes. In the current state this would give medium/bad performances. One case where these could be used efficiently is when you search something on one particular instance (this is not currently proposed in Hubl). Example: I search users with skill 1, 2 and 3 on the instance 1 only.

To be able to respond in less than a second for the existing use cases we should use federated indexes: indexes that are on the federation instance. We can generated these federated indexes using the notifications coming from distributed instances whenever a user is modified. Another option is to periodically fetch the distributed indexes and create/update the federated index accordingly to what has been fetched.

Another remark: the skill and city indexes you generated are too much bound to your business domain because you are using your domain vocabulary as index predicates. It would be great to make them more generic following the previous comment by PA https://github.com/Wimmics/solid-start/issues/30#issuecomment-1862285460. We should also split the indexes into smaller indexes, depending on the size of the data.

Ideas:

So, on the federation instance you would have one "meta" index:

@prefix ex: <https://example.org#>.
@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#>.

<> a ex:Index. # Indexing indexes by their property

<#1> a ex:PropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ns1:hasSkill;
  ex:instancesIn <skillIndex.ttl>.

<#2> a ex:PropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ns1:hasPlace; # replace hasPlace by the existing predicate in your ontology
  ex:instancesIn <cityIndex.ttl>.

On this same federation instance you would have "meta" skill index:

@prefix ex: <https://example.org#>.
@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#>.

<> a ex:Index. # Indexing users by their skills

<#1> a ex:PropertyIndexRegistration;
  ex:forProperty ns1:hasSkill;
  ex:forValue ns1:skill1;
  rfds:seeAlso <skill1.ttl>.

<#2> a ex:PropertyIndexRegistration;
  ex:forProperty ns1:hasSkill;
  ex:forValue ns1:skill2;
  rfds:seeAlso <skill2.ttl>.
# ...

And also "meta" city index:

@prefix ex: <https://example.org#>.
@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#>.

<> a ex:Index. # Indexing users by their city

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ns1:hasPlace; # replace hasPlace by the existing predicate in your ontology
  ex:forValue "toulouse";
  rfds:seeAlso <toulouse.ttl>.

<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:hasPlace; # replace hasPlace by the existing predicate in your ontology
  ex:forValue "lyon";
  rfds:seeAlso <lyon.ttl>.
# ...

Then on the same federated instance you would have skill indexes like:

@prefix ex: <https://example.org#>.

<skillIndex.ttl> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8002/users/user2#me>,
    #...

And city indexes like:

@prefix ex: <https://example.org#>.

<lyon.ttl> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8003/users/user3#me>,
    #...