TNO / knowledge-engine

Improves interoperability between systems (i.e. devices, platforms, apps, databases) by exchanging data based on their semantics
https://knowledge-engine.eu
Apache License 2.0
30 stars 4 forks source link

Implement CombiMatch to improve reasoner performance #411

Closed Sophietje closed 2 weeks ago

Sophietje commented 1 year ago

In GitLab by @han.kruiger.tno.nl on Feb 10, 2023, 17:00

Based on today's discussions.

For the reasoner to work, it needs to combine graph patterns that represent different 'pieces' of knowledge.

A match can be found by looking at one or more triple patterns from two graph patterns, and observing that they are compatible. In essence, this means that these parts of knowledge can 'flow into eachother'.

Currently, we already have a Match data structure, which describes how some triple patterns match to other triple patterns, including how any variables in the first triple pattern map to variables/resources in the second.

In the matching algorithm, these matches are linked (or linkable) to rule nodes (which have a antecedent and consequent graph pattern) on both sides.

Example of a match

Input:

?s <type> ?t .
?b <type> <Sensor> .

This is the match (using the toString implementation of the Match):

|?s| type ?t=|?b| type Sensor,
?s type |?t|=?b type |Sensor|

This match conveys that the first triple pattern of the first graph pattern matches with the first triple pattern of the second graph pattern with two mappings: ?s maps to ?b and ?t maps to <Sensor>.

A more involved example of a match

Input:

?s <type> ?t .
?s <hasVal> ?d .
?b <type> <Sensor> .
?b <hasVal> ?v .

These are the matches (using the toString implementation of the Match):

|?s| type ?t=|?b| type Sensor,
?s type |?t|=?b type |Sensor|

(Same as the previous example)

|?s| hasVal ?d=|?b| hasVal ?v,
?s hasVal |?d|=?b hasVal |?v|

This match conveys that the second triple pattern of the first graph pattern matches with the second triple pattern of the second graph pattern with two mappings: ?s maps to ?b and ?d maps to ?v.

|?s| hasVal ?d=|?b| hasVal ?v,
?s hasVal |?d|=?b hasVal |?v|,
|?s| type ?t=|?b| type Sensor,
?s type |?t|=?b type |Sensor|

(This final match is not elementary; it is the combination of the first two.)

This match conveys that the first triple pattern of the first graph pattern matches with the first triple pattern of the second graph pattern with two mappings: ?s maps to ?b and ?t maps to <Sensor>.

Unfruitful matches

It turned out that, in the current implementation, a lot of matches are being generated that do not contribute to a result that fully covers a goal graph pattern. Let's call these matches unfruitful.

In many use cases of the reasoner, we know that the goal is to produce full bindings for a graph pattern, so these unfruitful matches are not that useful.

So to improve performance, unfruitful matches can be disregarded at some convenient moment in the algorithm.

To detect these unfruitful matches, we think the data structure CombiMatch will be useful.

Match is only for the matches between a pair of graph patterns (let's call the LHS "goal" and the RHS "candidate"), whereas a CombiMatch is for a (non-empty) collection of "candidate" graph patterns with respect to a "goal" graph pattern.

A possible moment in the algorithm to do use this is after having merged matches between the graph pattern pairs.

All of these Match objects can be trivially converted into a CombiMatch with only a single candidate graph pattern, and then the CombiMatches from different candidate graph patterns can be attempted to be merged together.

Then, after exhaustively merging the CombiMatches, we can discard all CombiMatches that do not fully cover the goal graph pattern, and only keep the Matches that contributed to the CombiMatches that remained.

We expect that this will result in significantly fewer matches.

Sophietje commented 1 year ago

In GitLab by @barry.nouwt.tno.nl on Mar 29, 2023, 10:30

We need to investigate whether CombiMatches can also combine two matches from the same candidate. Imagine we try to match the following two graph patterns:

?s1 <p> ?s2 .

and

?s1 <p> ?s2 . ?s2 <p> ?s3 .

The first graph pattern matches on the second in two ways (on the first triple pattern and the second triple pattern). So, this results in two Match objects. In the current algorithm for the reasoner, these two Match objects from the same graph pattern remain separate (just like Match objects from different graph patterns). Because we want to be sure that data of a single knowledge base can be combined with itself (as is, for example, used in transitive relations), we merge a TripleVarBindingSet with itself (bs.merge(bs)) at strategic moments in the algorithm. This merge() operation can be very time consuming if the TripleVarBindingSet is big (which happens a lot, because there are many 'unfruitful' matches). So, if we would allow these CombiMatch objects to also combine Match objects from the same graph pattern, we can remove these bs.merge(bs) operations from the code and improve performance even further.

Take this into consideration when implementing this feature.

Sophietje commented 1 year ago

In GitLab by @barry.nouwt.tno.nl on Apr 28, 2023, 07:56

marked this issue as related to #392

Sophietje commented 1 year ago

In GitLab by @han.kruiger.tno.nl on Aug 10, 2023, 09:11

@barry.nouwt.tno.nl Perhaps you can use the text in this issue when trying to improve the performance.

bnouwt commented 3 weeks ago

The matching algorithm seems to work now. Create a unit test to demonstrate its performance gain: eu.knowledge.engine.smartconnector.api.TestComplexGraphPatternMatching

Things that still need to be done: