RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

XSLT function current(): Difference in v5 vs. v6 behavior #193

Closed aljoshakoecher closed 1 year ago

aljoshakoecher commented 1 year ago

Hey guys,

I just realized a difference in the behavior of v5 vs. v6 with respect to the possibility of using the XSLT function current() in an XPath expression of an RML mapping. In case you don't know current(), see more info about this function here . I would like to know whether or not this change in behavior was intended. For me, the new behavior is worse than the old one and stops me from upgrading to v6.

I'm afraid without current() there is no way to get back to the iterator elements inside a deeply nested XPath expression of the subjectMap or predicateObjectMap. I often have cases in which there are n iterator matches and m matches for (a part of) the XPath expression inside the maps. In such "n*m cases" I just need current(). I hope I can make it clear with an example...

Please consider the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>

<InstanceHierarchy Name="InstanceHierarchy">    
    <InternalElement Name="B101" ID="a1c825c6-cc6e-4170-b63a-ee4eb4f105c1" RefBaseSystemUnitPath="SystemUnitClassLib/Hardware/Tank">
        <ExternalInterface Name="Input" ID="7b0f" RefBaseClassPath="InterfaceClassLib/ProductConnectionInterface" />
        <ExternalInterface Name="Output" ID="520b" RefBaseClassPath="InterfaceClassLib/ProductConnectionInterface" />
        <RoleRequirements RefBaseRoleClassPath="RoleClassLib/Components/Tank" />
    </InternalElement>
    <InternalElement Name="V101" ID="4406750f-f0be-4005-a711-b08815f397fa" RefBaseSystemUnitPath="SystemUnitClassLib/Hardware/Valve">
        <ExternalInterface Name="Input" ID="f0e1" RefBaseClassPath="InterfaceClassLib/ProductConnectionInterface" />
        <ExternalInterface Name="Output" ID="8608" RefBaseClassPath="InterfaceClassLib/ProductConnectionInterface" />
    </InternalElement>
    <InternalElement Name="B201" ID="da0f2436-eea6-4973-9413-7f914e9cd3b4" RefBaseSystemUnitPath="SystemUnitClassLib/Hardware/Tank">
        <ExternalInterface Name="Input" ID="3adf" RefBaseClassPath="InterfaceClassLib/ProductConnectionInterface" />
        <ExternalInterface Name="Output" ID="f0d4" RefBaseClassPath="InterfaceClassLib/ProductConnectionInterface" />
        <RoleRequirements RefBaseRoleClassPath="RoleClassLib/Components/Tank" />
    </InternalElement>

    <InternalLink RefPartnerSideA="520b" RefPartnerSideB="f0e1" Name="Link1" />
    <InternalLink RefPartnerSideA="8608" RefPartnerSideB="3adf" Name="Link2" />>    
</InstanceHierarchy>

In this example XML schema, in order to connect elements (InternalElements), you add an ExternalInterface to the elements to connect and create the connection via an InternalLink. An InternalLink has two "sides" (RefPartnerSideA and RefPartnerSideB), you can think of it as source and target of the connection. These sides reference IDs of the elements to connect. So in the example above, we have the following connections:

B101 --> (Link1) --> V101
V101 --> (Link2) --> B201

I want to map this link information into an ontology. Thus, I have created the following mapping. I want to iterate over all InternalLink elements and resolve all the existing links. That's why I chose /InstanceHierarchy/InternalLink to be the iterator. I then want to create an ex:RefPartner individual for everything that it referenced by the RefPartnerSideA. With the predicateObjectMap, I want to check the RefPartnerSideB of the current iterator element and connect it via ex:linked_to.

#Prefixes...   
<#InternalElement> a rr:TriplesMap;
    rml:logicalSource [
        rml:source "file.xml";
        rml:referenceFormulation ql:XPath;
        rml:iterator "/InstanceHierarchy/InternalLink"
    ];

    rr:subjectMap [
        rr:template "http://example.org/{//ExternalInterface[@ID=current()/@RefPartnerSideA]/../@Name}";
        rr:class ex:RefPartner
    ];

    rr:predicateObjectMap   [   
        rr:predicate  ex:linked_to;
        rr:objectMap    [ 
            rr:template "http://example.org/{//ExternalInterface[@ID=current()/@RefPartnerSideB]/../@Name}";
            rr:class ex:RefPartnerB
        ];
    ].

Running your CLI application, this works perfectly when using v5. The output is:

<http://example.org/B101> a ex:RefPartner;
  ex:linked_to <http://example.org/V101> .

<http://example.org/V101> a ex:RefPartner;
  ex:linked_to <http://example.org/B201> .

However, when using v6, I get an exception:

See full exception ```shell net.sf.saxon.s9api.SaxonApiException: Cannot find a 0-argument function named Q{http://www.w3.org/2005/xpath-functions}current() at net.sf.saxon.s9api.XPathCompiler.internalCompile(XPathCompiler.java:578) at net.sf.saxon.s9api.XPathCompiler.compile(XPathCompiler.java:546) at net.sf.saxon.s9api.XPathCompiler.evaluate(XPathCompiler.java:603) at be.ugent.rml.records.XMLRecord.get(XMLRecord.java:37) at be.ugent.rml.extractor.ReferenceExtractor.extract(ReferenceExtractor.java:31) at be.ugent.rml.functions.ConcatFunction.concat(ConcatFunction.java:43) at be.ugent.rml.functions.ConcatFunction.execute(ConcatFunction.java:27) at be.ugent.rml.functions.ConcatFunction.execute(ConcatFunction.java:14) at be.ugent.rml.termgenerator.NamedNodeGenerator.generate(NamedNodeGenerator.java:20) at be.ugent.rml.Executor.getSubject(Executor.java:372) at be.ugent.rml.Executor.executeWithFunction(Executor.java:143) at be.ugent.rml.Executor.execute(Executor.java:126) at be.ugent.rml.cli.Main.main(Main.java:400) at be.ugent.rml.cli.Main.main(Main.java:45) Caused by: net.sf.saxon.trans.XPathException: Cannot find a 0-argument function named Q{http://www.w3.org/2005/xpath-functions}current() at net.sf.saxon.expr.parser.XPathParser.grumble(XPathParser.java:326) at net.sf.saxon.expr.parser.XPathParser.grumble(XPathParser.java:293) at net.sf.saxon.expr.parser.XPathParser.reportMissingFunction(XPathParser.java:3776) at net.sf.saxon.expr.parser.XPathParser.parseFunctionCall(XPathParser.java:3687) at net.sf.saxon.expr.parser.XPathParser.parseBasicStep(XPathParser.java:2559) at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(XPathParser.java:2434) at net.sf.saxon.expr.parser.XPathParser.parseRelativePath(XPathParser.java:2353) at net.sf.saxon.expr.parser.XPathParser.parsePathExpression(XPathParser.java:2315) at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression(XPathParser.java:2329) at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression(XPathParser.java:2181) at net.sf.saxon.expr.parser.XPathParser.parseBinaryExpression(XPathParser.java:868) at net.sf.saxon.expr.parser.XPathParser.parseExprSingle(XPathParser.java:777) at net.sf.saxon.expr.parser.XPathParser.parseExpression(XPathParser.java:679) at net.sf.saxon.expr.parser.XPathParser.parsePredicate(XPathParser.java:2518) at net.sf.saxon.expr.parser.XPathParser.parsePredicate(XPathParser.java:2473) at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(XPathParser.java:2443) at net.sf.saxon.expr.parser.XPathParser.parseRemainingPath(XPathParser.java:2393) at net.sf.saxon.expr.parser.XPathParser.parsePathExpression(XPathParser.java:2298) at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression(XPathParser.java:2329) at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression(XPathParser.java:2181) at net.sf.saxon.expr.parser.XPathParser.parseExprSingle(XPathParser.java:777) at net.sf.saxon.expr.parser.XPathParser.parseExpression(XPathParser.java:679) at net.sf.saxon.expr.parser.XPathParser.parse(XPathParser.java:535) at net.sf.saxon.expr.parser.ExpressionTool.make(ExpressionTool.java:89) at net.sf.saxon.sxpath.XPathEvaluator.createExpression(XPathEvaluator.java:121) at net.sf.saxon.s9api.XPathCompiler.internalCompile(XPathCompiler.java:575) ... 13 more net.sf.saxon.s9api.SaxonApiException: Cannot find a 0-argument function named Q{http://www.w3.org/2005/xpath-functions}current() at net.sf.saxon.s9api.XPathCompiler.internalCompile(XPathCompiler.java:578) at net.sf.saxon.s9api.XPathCompiler.compile(XPathCompiler.java:546) at net.sf.saxon.s9api.XPathCompiler.evaluate(XPathCompiler.java:603) at be.ugent.rml.records.XMLRecord.get(XMLRecord.java:37) at be.ugent.rml.extractor.ReferenceExtractor.extract(ReferenceExtractor.java:31) at be.ugent.rml.functions.ConcatFunction.concat(ConcatFunction.java:43) at be.ugent.rml.functions.ConcatFunction.execute(ConcatFunction.java:27) at be.ugent.rml.functions.ConcatFunction.execute(ConcatFunction.java:14) at be.ugent.rml.termgenerator.NamedNodeGenerator.generate(NamedNodeGenerator.java:20) at be.ugent.rml.Executor.getSubject(Executor.java:372) at be.ugent.rml.Executor.executeWithFunction(Executor.java:143) at be.ugent.rml.Executor.execute(Executor.java:126) at be.ugent.rml.cli.Main.main(Main.java:400) at be.ugent.rml.cli.Main.main(Main.java:45) Caused by: net.sf.saxon.trans.XPathException: Cannot find a 0-argument function named Q{http://www.w3.org/2005/xpath-functions}current() at net.sf.saxon.expr.parser.XPathParser.grumble(XPathParser.java:326) at net.sf.saxon.expr.parser.XPathParser.grumble(XPathParser.java:293) at net.sf.saxon.expr.parser.XPathParser.reportMissingFunction(XPathParser.java:3776) at net.sf.saxon.expr.parser.XPathParser.parseFunctionCall(XPathParser.java:3687) at net.sf.saxon.expr.parser.XPathParser.parseBasicStep(XPathParser.java:2559) at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(XPathParser.java:2434) at net.sf.saxon.expr.parser.XPathParser.parseRelativePath(XPathParser.java:2353) at net.sf.saxon.expr.parser.XPathParser.parsePathExpression(XPathParser.java:2315) at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression(XPathParser.java:2329) at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression(XPathParser.java:2181) at net.sf.saxon.expr.parser.XPathParser.parseBinaryExpression(XPathParser.java:868) at net.sf.saxon.expr.parser.XPathParser.parseExprSingle(XPathParser.java:777) at net.sf.saxon.expr.parser.XPathParser.parseExpression(XPathParser.java:679) at net.sf.saxon.expr.parser.XPathParser.parsePredicate(XPathParser.java:2518) at net.sf.saxon.expr.parser.XPathParser.parsePredicate(XPathParser.java:2473) at net.sf.saxon.expr.parser.XPathParser.parseStepExpression(XPathParser.java:2443) at net.sf.saxon.expr.parser.XPathParser.parseRemainingPath(XPathParser.java:2393) at net.sf.saxon.expr.parser.XPathParser.parsePathExpression(XPathParser.java:2298) at net.sf.saxon.expr.parser.XPathParser.parseSimpleMappingExpression(XPathParser.java:2329) at net.sf.saxon.expr.parser.XPathParser.parseUnaryExpression(XPathParser.java:2181) at net.sf.saxon.expr.parser.XPathParser.parseExprSingle(XPathParser.java:777) at net.sf.saxon.expr.parser.XPathParser.parseExpression(XPathParser.java:679) at net.sf.saxon.expr.parser.XPathParser.parse(XPathParser.java:535) at net.sf.saxon.expr.parser.ExpressionTool.make(ExpressionTool.java:89) at net.sf.saxon.sxpath.XPathEvaluator.createExpression(XPathEvaluator.java:121) at net.sf.saxon.s9api.XPathCompiler.internalCompile(XPathCompiler.java:575) ... 13 more ```

I dont think there is any other way to replace the current() function inside my XPath expression, or is there? Simply using the . syntax doesn't work as the context seems to has changed, i.e., the current node isn't the iterator inside a map. Instead, it's the current selection of the XPath expression in that map. Consequently, I don't get a result if I replace current() with . in my mapping. I'm curious - was this change in behavior intended? This issue got quite long, I hope my example and the problem is understandable. Thank you for any hints!

PS: I think you don't have any of such "n*m cases" in your tests, right? I looked through most of the XML tests and they were rather simple with respect to the XML structure. Might be worth adding a more complex mapping example with longer XPath expressions.

julianrojas87 commented 1 year ago

Hi @aljoshakoecher,

Not so long ago, I was in a similar situation trying to use current() to hold a reference to some value to then do some filtering. I think the problem is that the behavior of current() is different in XPath 1.0 (used in v5 of the mapper) than in XPath 2.x and above (used since v6 of the mapper).

You can accomplish the same behavior using let expressions introduced in XPath 3.0. I managed to produce your desired result using the following mappings:

<#InternalElement> a rr:TriplesMap;
    rml:logicalSource [
        rml:source "file.xml";
        rml:referenceFormulation ql:XPath;
        rml:iterator "/InstanceHierarchy/InternalLink"
    ];

    rr:subjectMap [
        rr:template "http://example.org/{ let $sideA := ./@RefPartnerSideA return(//ExternalInterface[@ID = $sideA]/../@Name)}";
        rr:class ex:RefPartner
    ];

    rr:predicateObjectMap   [   
        rr:predicate  ex:linked_to;
        rr:objectMap    [ 
            rr:template "http://example.org/{let $sideB := ./@RefPartnerSideB return(//ExternalInterface[@ID = $sideB]/../@Name)}";
            rr:class ex:RefPartnerB
        ];
    ].

PS: Here you can find some test cases using more complex XPath expressions for XPath 2.0 and XPath 3.0.

aljoshakoecher commented 1 year ago

Hey @julianrojas87, thank you for pointing me into a very interesting direction. I assumed that my problems with current() had something to do with the move away from XPath 1.0, but I didn't look into newer features of more modern XPath versions.

Thank you very much for this hint and also for the links to the examples. This is something I will definitely use!

julianrojas87 commented 1 year ago

Happy to help!

For future reference, the mapper is using the Saxon-HE v11 XML and XPath parser, so any expression supported by the parser should also work in the mapper.