fhircat / jena

Apache Jena - copy for jena-shex development
https://jena.apache.org/
Apache License 2.0
0 stars 0 forks source link

import closure just after parsing #8

Open iovka opened 1 year ago

iovka commented 1 year ago

Change

Currently on b553fa2, import closure of a schema is computed immediately before validation.

The only place where import closure is problematic is for printing a schema as it was defined (in particular for testing the parser).

We can consider that there are two 'aspects' of a schema: syntactic and semantic.

The syntactic aspect is the one being parsed, and contains list of shape declarations, imports, start action, prefixes and base URI. Used for serializing the schema as it was defined.

The semantic aspect is defined by the reference maps Map<Node, ShapeDecl> and Map<Node, TripleExpr> and is used for validation, and for all the rest.

Proposal : Distinguish these two aspects of the schema

In practice a ShexSchema class would still have the same attributes, but we will make sure that the right subset of attributes are used in the right place (the syntactic attributes for printing, the semantic ones for validation).

This would allow to import the schema closure just after parsing it, and still pass the printer and parser tests.

Are you interested in contributing a pull request for this task?

None

ericprud commented 1 year ago

I tested patch that made keeping track of those the users' problem:

modified   jena-shex/src/main/java/org/apache/jena/shex/sys/ShexValidatorImpl.java
@@ -74,10 +74,9 @@ class ShexValidatorImpl implements ShexValidator{
         Objects.requireNonNull(shapes);
         Objects.requireNonNull(dataGraph);
         ShexRecord entry = new ShexRecord(focus, shapeRef);
-        ShexSchema schemaWithImports = shapes.importsClosure();
-        ValidationContext vCxt = new ValidationContext(dataGraph, schemaWithImports, semanticActionPluginIndex);
+        ValidationContext vCxt = new ValidationContext(dataGraph, shapes, semanticActionPluginIndex);

-        boolean isValid = vCxt.dispatchStartSemanticAction(schemaWithImports, vCxt);
+        boolean isValid = vCxt.dispatchStartSemanticAction(shapes, vCxt);
         if ( !isValid )
             report(vCxt, entry, focus, ShexStatus.nonconformant, null);
         else
modified   jena-shex/src/test/java/org/apache/jena/shex/runner/ShexValidationTest.java
@@ -117,7 +117,7 @@ public class ShexValidationTest implements Runnable {
         this.shapeMap = (shapeMapRef == null)
                 ? null
                 : Shex.readShapeMapJson(shapeMapRef);
-        this.shapes = Shex.readSchema(schema.getURI(), base);
+        this.shapes = Shex.readSchema(schema.getURI(), base).importsClosure();
         this.positiveTest = entry.getTestType().equals(ShexT.cValidationTest);
         this.traits = ShexTests.extractTraits(entry);
         this.extensionResults = ShexTests.extractExtensionResults(entry);

, but that pre-dated me enabling issues so I never discussed it. The main impact was that users have to call .importsClosure() before validation if they have imports.

  1. should parsers call .isValid() or should it be left to users? a. should be able to call .validate() if you think the validation won't touch e.g. a missing referent or a negated cycle?
  2. should parsers call .importsClosure() or should it be left to users?
  3. should the schema keep track of whether it's been imported or validated?

In part for backward-compatibility with the current API and in part to give users control over when they perform expensive tasks, I'm leaning towards our current implementation which requires .importsClosure().isValid().