RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

Conditional instantiation w/ joinCondition not working anymore since v6.3 #236

Open schivmeister opened 2 months ago

schivmeister commented 2 months ago

Environment

rmlmapper v.6.3.0, v6.5.1 Linux/WSL2 Java 11, 17

Namespaces

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix ex: <http://data.example.org/resource/> .
@prefix org: <http://www.w3.org/ns/org#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix fnml:   <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@prefix idlab-fn: <http://example.com/idlab/function/> .

Problem

Given the following kind of input XML with relationships between Organization and Address, where the second of three Organizations does not have a StreetName in its Address:

<Directory>
    <Organization>
        <ID>123</ID>
        <Name>ABC FastCo</Name>
        <Address>
            <StreetName>ABC FastCo Lane</StreetName>
        </Address>
    </Organization>
    <Organization>
        <ID>456</ID>
        <Name>XYZ Inc.</Name>
        <Address>
            <Area>XYZ Metro</Area>
        </Address>
    </Organization>
    <Organization>
        <ID>789</ID>
        <Name>MNO Ltd</Name>
        <Address>
            <StreetName>99 Maine St</StreetName>
        </Address>
    </Organization>
</Directory>

and the following kind of RML mapping involving a null-default conditional reference on the subjectMap of a joined parentTriplesMap, in order to avoid creating the related Address instance if certain elements (e.g. StreetName) are absent:

ex:Organizations a rr:TriplesMap;
    rml:logicalSource [
        rml:source "test.xml";
        rml:iterator "/Directory/Organization";
        rml:referenceFormulation ql:XPath
    ];
    rr:subjectMap [
        rr:template "http://data.example.org/resource/Organization_{ID}";
        rr:class org:Organization
    ];
    rr:predicateObjectMap [
        rr:predicate org:name;
        rr:objectMap
            [
                rml:reference "Name"
            ];
    ] ;
    rr:predicateObjectMap [
        rr:predicate org:address;
        rr:objectMap
            [
                rr:parentTriplesMap ex:Addresses ;
                rr:joinCondition [
                    rr:child "path(.)";
                    rr:parent "path(..)";
                ];
            ];
    ]
.

ex:Addresses a rr:TriplesMap;
    rml:logicalSource [
        rml:source "test.xml";
        rml:iterator "/Directory/Organization/Address";
        rml:referenceFormulation ql:XPath
    ];
    rr:subjectMap [
        # rr:template "http://data.example.org/resource/Address_{generate-id(.)}";
        rml:reference "if(exists(StreetName)) then 'http://data.example.org/resource/Address_' || generate-id(.) else null";
        rr:class org:Address
    ];
    rr:predicateObjectMap [
        rr:predicate org:streetName;
        rr:objectMap
            [
                rml:reference "StreetName"
            ];
    ] ;
.

Actual

Results in an unexpected null error:

14:49:50.108 [main] ERROR be.ugent.rml.cli.Main               .run(420) - Cannot invoke "java.util.Collection.toArray()" because "c" is null
14:49:50.110 [main] ERROR be.ugent.rml.cli.Main               .run(457) - Cannot invoke "java.util.Collection.toArray()" because "c" is null
java.lang.NullPointerException: Cannot invoke "java.util.Collection.toArray()" because "c" is null
        at java.base/java.util.ArrayList.addAll(ArrayList.java:670)
        at be.ugent.rml.Executor.getIRIsWithTrueCondition(Executor.java:464)
        at be.ugent.rml.Executor.getIRIsWithConditions(Executor.java:424)
        at be.ugent.rml.Executor.generatePredicateObjectGraphs(Executor.java:363)
        at be.ugent.rml.Executor.executeWithFunction(Executor.java:178)
        at be.ugent.rml.Executor.execute(Executor.java:132)
        at be.ugent.rml.cli.Main.run(Main.java:416)
        at be.ugent.rml.cli.Main.main(Main.java:49)

Expected

Should result in an output with just two Address instances related by the right Organization instances (and not the third one which fails the condition and therefore would have a null subjectMap leading to no instantiation):

ex:Address_d0e33 a org:Address;
  org:streetName "99 Maine St" .

ex:Address_d0e9 a org:Address;
  org:streetName "ABC FastCo Lane" .

ex:Organization_123 a org:Organization;
  org:address ex:Address_d0e9;
  org:name "ABC FastCo" .

ex:Organization_456 a org:Organization;
  org:name "XYZ Inc." . # no address relation, correct

ex:Organization_789 a org:Organization;
  org:address ex:Address_d0e33;
  org:name "MNO Ltd" .

Wrong

If one removes the joinCondition:

        rr:objectMap
            [
                rr:parentTriplesMap ex:Addresses ;
                # rr:joinCondition [
                #     rr:child "path(.)";
                #     rr:parent "path(..)";
                # ];
            ];

the transformation works, but of course results in erroneous output as the child instances are all related by all parents:

ex:Address_d0e33 a org:Address;
  org:streetName "99 Maine St" .

ex:Address_d0e9 a org:Address;
  org:streetName "ABC FastCo Lane" .

ex:Organization_123 a org:Organization;
  org:address ex:Address_d0e33, ex:Address_d0e9; # wrong
  org:name "ABC FastCo" .

ex:Organization_456 a org:Organization;
  org:address ex:Address_d0e33, ex:Address_d0e9; # wrong, should have nothing
  org:name "XYZ Inc." .

ex:Organization_789 a org:Organization;
  org:address ex:Address_d0e33, ex:Address_d0e9; # wrong
  org:name "MNO Ltd" .

Workaround

None. Use v6.2.2 or below to get the right result.

MWE

rml-mwe-conditional-error.zip

DylanVanAssche commented 1 month ago

Fixed in https://github.com/RMLio/rmlmapper-java/commit/144f9b4cb1ca3c7174f9453f28ec626996c19020