Wimmics / solid-start

Projet SOLID Inria - Startin'blox
MIT License
1 stars 0 forks source link

Index strings starting with some letters #39

Open lecoqlibre opened 2 months ago

lecoqlibre commented 2 months ago

The meta index says where to find indexes for a property starting with some letters:

<> a ex:StartsWithIndex;
    ex:forProperty ex:firstName.

:a a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "a";
    rdfs:seeAlso <path/to/first-letter-index.ttl>.

:b a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "b";
    rdfs:seeAlso <path/to/first-letter-index.ttl>.

:aa a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "aa";
    rdfs:seeAlso <path/to/first2-letters-index.ttl>.

:aaa a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "aaa";
    rdfs:seeAlso <path/to/first3-letters-index.ttl>.

...

Here is an example of an instance index (first-letter-index):

<> a ex:StartsWithIndex;
    ex:forProperty ex:firstName.

:a a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "a";
    ex:instance <user1>, <user2>, <userN>.

:b a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "b";
    ex:instance <user12>, <user22>, <userNN>.

:z a ex:StartsWithIndexRegistration;
    ex:forProperty ex:firstName; # optional
    ex:forValue "z";
    ex:instance <user13>, <user23>, <userNNN>.

I tried over our test file of 5 000 users. With a depth of 10 letters, here is an ls command that shows size of the index files:

201K 17 avril 09:34 'firstNameIndex1$.ttl' # index the first letter of first name
15K  17 avril 09:34 'firstNameIndex10$.ttl' # index the tenth first letters of first name
205K 17 avril 09:34 'firstNameIndex2$.ttl'
213K 17 avril 09:34 'firstNameIndex3$.ttl'
216K 17 avril 09:34 'firstNameIndex4$.ttl'
201K 17 avril 09:34 'firstNameIndex5$.ttl'
161K 17 avril 09:34 'firstNameIndex6$.ttl'
111K 17 avril 09:34 'firstNameIndex7$.ttl'
62K  17 avril 09:34 'firstNameIndex8$.ttl'
30K  17 avril 09:34 'firstNameIndex9$.ttl'
201K 17 avril 09:34 'lastNameIndex1$.ttl' # index the first letter of last name
4,3K 17 avril 09:34 'lastNameIndex10$.ttl' # index the tenth first letters of last name
206K 17 avril 09:34 'lastNameIndex2$.ttl'
221K 17 avril 09:34 'lastNameIndex3$.ttl'
231K 17 avril 09:34 'lastNameIndex4$.ttl'
218K 17 avril 09:34 'lastNameIndex5$.ttl'
171K 17 avril 09:34 'lastNameIndex6$.ttl'
105K 17 avril 09:34 'lastNameIndex7$.ttl'
51K  17 avril 09:34 'lastNameIndex8$.ttl'
21K  17 avril 09:34 'lastNameIndex9$.ttl'
FabienGandon commented 2 months ago

Quick question, why not annotate indices directly?

<> a ex:StartsWithMetaIndex;
    ex:forProperty ex:firstName.

<path/to/first-letter-index.ttl> a ex:StartsWithIndex;
    ex:forProperty ex:firstName;
    ex:forValue "a".
FabienGandon commented 2 months ago

And we could also use SHACL:

<path/to/first-letter-index.ttl> a ex:StartsWithIndex;
    ex:hasShape [
        sh:path ex:firstName;
        sh:pattern "a.*" .
    ] .