RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
159 stars 61 forks source link

Repeated XML `NUTS` elements not creating mapping from JSON vocabulary #228

Open schivmeister opened 1 year ago

schivmeister commented 1 year ago

Environment

rmlmapper v6.2.2 (reproducible also as far back as v6.1.3 and even older) Linux/WSL2 Java 11

Problem

Given the following kind of input XML:

<OBJECT>
  <NUTS CODE="DE"/>
  <NUTS CODE="DE8"/>
  <NUTS CODE="DE80"/>
  <NUTS CODE="DE803"/>
</OBJECT>

and the following kind of subject mapping:

tedm:nuts a rr:TriplesMap ;
    rml:logicalSource
        [
            rml:source "resources/nuts.json" ;
            rml:iterator "$.results.bindings[*]" ;
            rml:referenceFormulation ql:JSONPath
        ] ;
    rr:subjectMap
        [
            rml:reference
                "conceptURI.value" ;
        ] .

and the following kind of predicate-object mapping:

    rr:predicateObjectMap
        [
            rr:predicate epo:hasNutsCode ;
            rr:objectMap
                [
                    rr:parentTriplesMap tedm:nuts;
                    rr:joinCondition [
                        rr:child "*:NUTS/@CODE";
                        rr:parent "code.value";
                    ];
                ] ;
        ] ;

with the following kind of value mapping vocabulary in JSON format:

      {
        "code": {
          "type": "literal",
          "value": "DE803"
        },
        "conceptURI": {
          "type": "uri",
          "value": "http://data.europa.eu/nuts/code/DE803"
        }
      },

Actual

Results in no mapping created for epo:hasNutsCode.

Expected

Should result in as many instances of epo:hasNutsCode with values as there are NUTS elements with CODE values, matching terms in the given vocabulary nuts.json.

  epo:hasNutsCode <http://data.europa.eu/nuts/code/DE> ,
                  <http://data.europa.eu/nuts/code/DE8> ,
                  <http://data.europa.eu/nuts/code/DE80> ,
                  <http://data.europa.eu/nuts/code/DE803> .

Workaround

Removing all but one NUTS element results in a successful mapping of epo:hasNutsCode.

Input:

<OBJECT>
  <NUTS CODE="DE803"/>
</OBJECT>

Output:

  epo:hasNutsCode <http://data.europa.eu/nuts/code/DE803> .

MWE

rml-mwe-nuts-json.zip

Context

This was discovered while troubleshooting #226, a result reproducibility issue pertaining to the mapping of OP TED XML notices to generate RDF.

P.S: The issue described in #226 for reproduction actually is about a partial, incomplete original result, i.e. multiple values created only one mapping, which is no longer reproducible (as in turn evidenced in this ticket).

csnyulas commented 1 year ago

Did anyone manage to reproduce this? What could be the cause for this behavior? Can someone confirm that this is a bug (perhaps a regression)?

csnyulas commented 10 months ago

Dear developers, Is there any chance that you can look at this problem in the near future? We would like to know if this is indeed a bug, and if yes, is it going to be fixed or not. Thank you!

bjdmeest commented 10 months ago

Hi all,

I can confirm this is a bug. A minimal (YARRRML) example attached for reproducibility: join-on-parent-list.zip. I'm assuming the problem is that the join operator doesn't take parent lists into account, and perhaps just compares the toString representation of the parent (so that list) with the child.

I'll put it on our todolist, but our development efforts are guided by projects, so I can't promise a swift handling. If you would need some more timely assistance, we can always discuss further via info@rml.io