TopQuadrant / shacl

SHACL API in Java based on Apache Jena
Apache License 2.0
215 stars 61 forks source link

Support for iterative inference? #162

Closed gtfierro closed 7 months ago

gtfierro commented 7 months ago

Hello!

I've found myself wrapping the inference script up so that I can perform iterative expansion of the graph until it either reaches a fixed point or I've reached some maximum number of iterations. Doing this externally to your inference script isn't ideal because inference rules that produce blank nodes can fail to be deduplicated due to serializing the graph to disk in between inference runs (and thus losing the bnode identities).

Would you be open to changing the inference script so that it could provide iterative inference? I have gotten something basic working locally:

--- a/src/main/java/org/topbraid/shacl/tools/Infer.java
+++ b/src/main/java/org/topbraid/shacl/tools/Infer.java
@@ -51,7 +51,32 @@ public class Infer extends AbstractTool {
                if(shapesModel == null) {
                        shapesModel = dataModel;
                }
-               Model results = RuleUtil.executeRules(dataModel, shapesModel, null, null);
-               results.write(System.out, FileUtils.langTurtle);
-       }
+
+        // execute the rules over and over until there are no new results
+        // or until a maximum number of iterations has been reached
+        int maxIterations = 100;
+        int iteration = 0;
+        while (iteration < maxIterations) {
+            //System.err.println("Iteration " + iteration);
+            long currentSize = dataModel.size();
+                   Model results = RuleUtil.executeRules(dataModel, shapesModel, null, null);
+            // if no results, break
+            if (results.size() == 0) {
+                break;
+            }
+            // add the results to the data model
+            dataModel.add(results);
+            // if no new results, break
+            if (dataModel.size() == currentSize) {
+                break;
+            }
+            iteration++;
+        }
+        //if (iteration == maxIterations) {
+        //    System.err.println("Maximum number of iterations reached");
+        //}
+        //System.err.println("Finished after " + iteration + " iterations");
+        // write the result to standard out
+        dataModel.write(System.out, FileUtils.langTurtle);
+    }
 }

No worries if you already have something planned, but I would be happy to fix this up and add some CLI flags to customize its usage in a way that you find acceptable. Let me know!


I wanted to point out that I know that the code above is a bit buggy and probably doesn't do exactly what we want. It's a question if we want to just return materialized triples or the full model, for example

HolgerKnublauch commented 7 months ago

Hi Gabe, yes we have this option in our product and it is implemented pretty much like yours. It can iterate until it reaches a fixpoint, but rules that infer blank nodes are a stopper. And yes, if you want this in the open source project, I would need a PR with a flag and/or max iteration counter.