RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

Unreproducible results for repeated XML `NUTS` elements with JSON vocabulary mapping #226

Open schivmeister opened 8 months ago

schivmeister commented 8 months ago

Background

RMLMapper is a tool used as part of another library (basically a wrapper around rmlmapper and other tools) by Meaningfy to aid in the mapping of OP TED notices from XML to RDF. However, since the transformation was run by the mapping team at Meaningfy in July 2023, the same results can no longer be reproduced in November 2023, despite using the same version of rmlmapper v6.1.3.

One of these potential regressions relates to the disappearance, from some of the data, of multiple occurrences of a property called epo:hasNutsCode, which is itself related to a corresponding object/value data vocabulary. Help is now sought to determine what the root cause for this behaviour could be.

Problem

Expected

Occurrence of epo:hasNutsCode in the resulting RDF data, wherever there is at least one <NUTS> element with a CODE attribute value in the XML input data.

Actual

Missing epo:hasNutsCode in the resulting RDF data, wherever there is repeated (more than one) <NUTS> element with a CODE attribute value in the XML input data.

Observations

It was later found that the issue occurs in cases where the input XML file has repeated <NUTS> elements with different CODE attribute values, and the object mapping is sourced from the above-cited JSON vocabulary file. Removing repeated values (keeping only one) appears to fix this. However, this is unexpected, as the previous transformation in July 2023 did not exhibit this behaviour, and there was at least one occurrence of the property with a value.

MWE

As the transformation involves multiple RML files/modules, and it is not useful to prepare a very minimal example without all the contextual data, a reproduction test suite (of a mostly-minimal working example) is attached with this ticket. It contains also the MWE for another potential regression #227 identified alongside this one.

mfy-rml-mwe.zip