Closed lecoqlibre closed 10 months ago
First, a few general remarks:
General remark G1: several of these options are relying on off-band knowledge. In other words, part of the semantics of the index is left implicit ("which property are we indexing on?", "are we indexing on the whole value, the first letter, the first two letters?..."), which in my view is not good. Caveat: things could be part of the client-to-client protocol (C2CP), in which case it does not need to be explicit in the graphs -- but the more off-band knowledge you defer to C2CP, the less flexible your implementations will be.
General remark G2: encoding information in IRIs is generally frowned upon in RDF. Part of it is related to G1 above: this is putting information inside the IRI (which requires specifc knowledge to decode) rather than as triples around that IRI (which is standard RDF). Part of it is because it causes a combinatorial explosion of the terms in your vocabulary (cf. comment by @FabienGandon this morning).
General remark G3: I don't think that C2CP should mandate specific filenames or container names. I think that specific predicates should be used to indicate the location of specific containers/files.
E.g.: even though the public type index is typically stored in $MY_POD/profile/publicTypeIndex.ttl
, this is not required by the C2CP. Instead, it is discovered by following the solid:publicTypeIndex
property in my WebID.
From there remarks, let me state a few requirements for indexing strategies :
R1: the property on which we are indexing (here, dfc-b:familyName
) should be explicit (i.e. expressed in triples)
R2: the part of the property value we are indexing (here, the first letter) should be explicit (idem)
R3: the indexed value (here "A", "B"...) should be explicit
R4: do not encode values in IRIs
R5: do not mandate file/container names
R1 | R2 | R3 | R4 | R5 | |
---|---|---|---|---|---|
Option 1 | ✓ | ✓ | |||
Option 2 | (1) | (1) | ✓ | ||
Option 3 | ✓ | ✓ | ✓ | (2) | ✓ |
Option 4.a (3) | |||||
Option 4.b (3) | ✓ | ✓ | ✓ | ||
Option 4.c (3) | ✓ | ✓ | ✓ | ✓ | ✓ |
Option 5 | ✓ | ✓ | ✓ | ✓ | ✓ |
Option 6 | ✓ | ✓ | ✓ | ✓ | ✓ |
(1) arguably, the semantics of "first letter" and "A" can be considered part of the built-in semantics of the vocabulary term dfc-b:startingWithA
but that conflicts with R4 anyway.
(2) I know that I proposed something like that this morning, but I now realize that a predicate such as dfc-b:familyNameStartsWith
suffers (to some extent) of the "combinatorial explosion" problem raised by @FabienGandon . (Consider dfc-b:familyNameEndsWith
, dfc-b:familyNameContains
, dfc-b:givenNameStartsWith
, dfc-b:givenNameEndssWith
...).
(3) I call Option 4.a the variant where "naming and location of the index files could be directly defined by the client-to-client standard", so everything is implicit, including file names. Not a fan :smiling_imp:
I call Options 4.a and 4.b the two variants where solid:FirstLetterIndex
is used in the type index.
I like, in options 4.c, 5 and 6, that the standard type index is used to make our new kinds of index discoverable. That's indeed a nice to have.
I also like, in options 4.c and 6, that the type index entries are extended with additional properties (solid:forLetter
, solid:forProperty
), while remaining backward compatible (i.e. ignoring these properties does not lead to a wrong interpretation, only less discriminating).
I don't think having multiple small indexes is a good idea. So all things considered, Option 6 is my favorite.
(except that we should not coin new terms in the solid:
namespace -- we do not own that namespace... nitpicking)
Thank you @pchampin, I also prefer the option 6.
About the prefix naming, I would use solid
as long as we don't have something else to propose.
An enhancement of the option 6 could be to make it more generic with the option 7 below.
ValueIndex
and ValueRegistration
forLetter
would become forValue
forPosition
property: to express where the value should be encounteredFile typeIndex.ttl
:
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-b: <https://www.datafoodconsortium.org#>.
<>
a solid:TypeIndex;
a solid:ListedDocument.
<#ab09fd> a solid:TypeRegistration;
solid:forClass solid:ValueIndex;
solid:forValue "a", "b", "z";
solid:forPosition 0;
solid:forProperty dfc-b:familyName;
solid:instance <indexValue.ttl>.
File indexValue.ttl
:
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-b: <https://www.datafoodconsortium.org#>.
<>
a solid:ValueIndex;
a solid:ListedDocument;
solid:forPosition 0;
solid:forProperty dfc-b:familyName.
# This is indexing persons with a family name starting with the letter "a".
<#ab09fd> a solid:ValueRegistration;
solid:forValue "a";
solid:instance </agents/persons/person32.ttl>, </agents/persons/person12.ttl>.
# This is indexing persons with a family name starting with the letter "b".
<#zx45yh> a solid:ValueRegistration;
solid:forValue "b";
solid:instance </agents/persons/person56.ttl>, </agents/persons/person78.ttl>.
# This is indexing persons with a family name starting with the letter "z".
<#sk17vb> a solid:ValueRegistration;
solid:forValue "z";
solid:instance </agents/persons/person2.ttl>, </agents/persons/person63.ttl>.
Any thoughts @FabienGandon
File typeIndex.ttl
:
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-b: <https://www.datafoodconsortium.org#>.
<>
a solid:TypeIndex;
a solid:ListedDocument.
<#ab09fd> a solid:TypeRegistration;
solid:forClass solid:ValueIndex;
solid:forRegex "/^[a|b|z]/i";
solid:forProperty dfc-b:familyName;
solid:instance <indexValue.ttl>.
File indexValue.ttl
:
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-b: <https://www.datafoodconsortium.org#>.
<>
a solid:ValueIndex;
a solid:ListedDocument;
solid:forProperty dfc-b:familyName.
# This is indexing persons with a family name starting with the letter "a".
<#ab09fd> a solid:ValueRegistration;
solid:forRegex "/^[a]/i";
solid:instance </agents/persons/person32.ttl>, </agents/persons/person12.ttl>.
# This is indexing persons with a family name starting with the letter "b".
<#zx45yh> a solid:ValueRegistration;
solid:forRegex "/^[b]/i";
solid:instance </agents/persons/person56.ttl>, </agents/persons/person78.ttl>.
# This is indexing persons with a family name starting with the letter "z".
<#sk17vb> a solid:ValueRegistration;
solid:forRegex "/^[z]/i";
solid:instance </agents/persons/person2.ttl>, </agents/persons/person63.ttl>.
sh:pattern
)I like the idea of using a regexp (using SPARQL regular expressions like SHACL does, but not reusing the property sh:pattern
, because it is a property of shapes, and value registrations are not shapes).
However, I foresee a performance vs. genericity tradeoff, here. Assume I am looking for someone named Champin:
if I have FirstLetterIndex, I can easily find the relevant resources by following a path that looks like: "c" <-( solid:forLetter )- ?x -( solid:forInstance)-> ?y
, which can be quite fast.
if I have a ValueIndex using regexps, I need to inspect all registrations on the index, and check whether "Champin" satisfies the give regular expression. This will likely be much slower.
I concur with the nice analysis of @pchampin and I would suggest reusing known function names / properties to help adoption.
For instance SPARQL has a function strstarts
so we could have
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix dfc-b: <https://www.datafoodconsortium.org#>.
<>
a solid:ListedDocument;
solid:forProperty dfc-b:familyName.
# This is indexing persons with a family name starting with the letter "a".
<#ab09fd> a solid:ValueRegistration;
solid:strstarts "a";
solid:instance </agents/persons/person32.ttl>, </agents/persons/person12.ttl>.
How can we find persons with a family name starting with a certain letter?
Option 1: using anchors
Here is a kind of a hack to index persons by the first letter of their family name using anchor:
To find all persons with a family name starting with the letter "z" we can get the location
/path/to/the/index.ttl#z
.Option 2: using subjects
Another option would be to make letters becoming subjects like:
Option 3: using properties
Another option would be to use a
familyNameStartsWith
property:Option 4: using one file per letter
We could have one index file for the letter "a", one for the letter "b" and one for the letter "z".
File
indexA.ttl
:File
indexB.ttl
:File
indexZ.ttl
:The naming and location of the index files could be directly defined by the client-to-client standard.
Or we could define a new type like
solid:FirstLetterIndex
that could be indexed in the TypeIndex for instance.File
typeIndex.ttl
:Adding a
solid:forLetter
and asolid:forProperty
properties could tell us directly were to find the appropriate index:File
typeIndex.ttl
:Option 5: using one file per letter, TypeIndex style
To replace our custom indexes
indexA.ttl
,indexB.ttl
andindexZ.ttl
by regular TypeIndex we could introduce a new kind of registration:solid:FirstLetterRegistration
:File
indexA.ttl
:Consider the same modifications for files
indexB.ttl
andindexZ.ttl
.We can avoid the repetition of the
solid:forLetter
andsolid:forProperty
in each of the registrations if we define it in thesolid:FirstLetterIndex
directly:File
indexA.ttl
:Option 6: one FirstLetterIndex
File
typeIndex.ttl
:File
indexFirstLetter.ttl
: