Open barakmich opened 6 years ago
This will be very helpful, since the "name" case is very typical for GraphQL (most properties has one value), Gizmo (Has, Save), etc.
For query optimizer, if it knows that a predicate has exactly one value, it's always better to do lookup/check for that predicate upfront. It can then re-run optimization again to properly estimate fan-out after computing specific nodes in the resulting set.
Even better, since HLL supports merge/intersection, we may try to merge multiple filters at runtime to estimate fan-out for intersections of multiple LinksTo.
The only problem as I see now - it does not support deletion, and modification of such filter might require some tricks with transactions in backend implementation.
HyperLogLog allows you to, in a fixed number of bits, keep track of the cardinality of a set. It works a little like a Bloom filter...
The proposal is this: As a new index -- but only for nodes which appear in the predicate field -- we keep some new data
And when a new quad comes in:
We add the subject and object to the respective HLLs in the index.
Now, currently, we assume a fanout of 20 * underlying when trying to estimate the Size of a LinksTo. Instead, we can do much better, if we know the predicate:
Or the reciprocal, depending on the direction of the link.
This means predicates like "name" will have a fan-out of 1 -- people generally only have one name. However, a given name will have a slightly larger fan-in -- maybe 1.5 -- as some names have more than one person (the "John Smith" effect)