Closed Sophietje closed 2 weeks ago
In GitLab by @barry.nouwt.tno.nl on Mar 29, 2023, 10:30
We need to investigate whether CombiMatch
es can also combine two matches from the same candidate. Imagine we try to match the following two graph patterns:
?s1 <p> ?s2 .
and
?s1 <p> ?s2 . ?s2 <p> ?s3 .
The first graph pattern matches on the second in two ways (on the first triple pattern and the second triple pattern). So, this results in two Match
objects. In the current algorithm for the reasoner, these two Match
objects from the same graph pattern remain separate (just like Match
objects from different graph patterns). Because we want to be sure that data of a single knowledge base can be combined with itself (as is, for example, used in transitive relations), we merge a TripleVarBindingSet
with itself (bs.merge(bs)
)
at strategic moments in the algorithm. This merge()
operation can be very time consuming if the TripleVarBindingSet
is big (which happens a lot, because there are many 'unfruitful' matches). So, if we would allow these CombiMatch
objects to also combine Match
objects from the same graph pattern, we can remove these bs.merge(bs)
operations from the code and improve performance even further.
Take this into consideration when implementing this feature.
In GitLab by @barry.nouwt.tno.nl on Apr 28, 2023, 07:56
marked this issue as related to #392
In GitLab by @han.kruiger.tno.nl on Aug 10, 2023, 09:11
@barry.nouwt.tno.nl Perhaps you can use the text in this issue when trying to improve the performance.
The matching algorithm seems to work now. Create a unit test to demonstrate its performance gain: eu.knowledge.engine.smartconnector.api.TestComplexGraphPatternMatching
Things that still need to be done:
SerialMatchingProcessor
and should also be included in the ReasonerProcessor
.
In GitLab by @han.kruiger.tno.nl on Feb 10, 2023, 17:00
Based on today's discussions.
For the reasoner to work, it needs to combine graph patterns that represent different 'pieces' of knowledge.
A match can be found by looking at one or more triple patterns from two graph patterns, and observing that they are compatible. In essence, this means that these parts of knowledge can 'flow into eachother'.
Currently, we already have a
Match
data structure, which describes how some triple patterns match to other triple patterns, including how any variables in the first triple pattern map to variables/resources in the second.In the matching algorithm, these matches are linked (or linkable) to rule nodes (which have a antecedent and consequent graph pattern) on both sides.
Example of a match
Input:
This is the match (using the
toString
implementation of theMatch
):This match conveys that the first triple pattern of the first graph pattern matches with the first triple pattern of the second graph pattern with two mappings:
?s
maps to?b
and?t
maps to<Sensor>
.A more involved example of a match
Input:
These are the matches (using the
toString
implementation of theMatch
):(Same as the previous example)
This match conveys that the second triple pattern of the first graph pattern matches with the second triple pattern of the second graph pattern with two mappings:
?s
maps to?b
and?d
maps to?v
.(This final match is not elementary; it is the combination of the first two.)
This match conveys that the first triple pattern of the first graph pattern matches with the first triple pattern of the second graph pattern with two mappings:
?s
maps to?b
and?t
maps to<Sensor>
.Unfruitful matches
It turned out that, in the current implementation, a lot of matches are being generated that do not contribute to a result that fully covers a goal graph pattern. Let's call these matches unfruitful.
In many use cases of the reasoner, we know that the goal is to produce full bindings for a graph pattern, so these unfruitful matches are not that useful.
So to improve performance, unfruitful matches can be disregarded at some convenient moment in the algorithm.
To detect these unfruitful matches, we think the data structure
CombiMatch
will be useful.Match
is only for the matches between a pair of graph patterns (let's call the LHS "goal" and the RHS "candidate"), whereas aCombiMatch
is for a (non-empty) collection of "candidate" graph patterns with respect to a "goal" graph pattern.A possible moment in the algorithm to do use this is after having merged matches between the graph pattern pairs.
All of these
Match
objects can be trivially converted into aCombiMatch
with only a single candidate graph pattern, and then theCombiMatch
es from different candidate graph patterns can be attempted to be merged together.Then, after exhaustively merging the
CombiMatch
es, we can discard allCombiMatch
es that do not fully cover the goal graph pattern, and only keep theMatch
es that contributed to theCombiMatch
es that remained.We expect that this will result in significantly fewer matches.