RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
155 stars 61 forks source link

Creating language tags #65

Closed mcm104 closed 4 years ago

mcm104 commented 4 years ago

Hello!

We've noticed an issue with our output when using the property rml:languageMap. When we have multiple values with different language tags, the output will tag all those values with the language tag of the first value. Here is an example:

Data:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
   xmlns:rdaw="http://rdaregistry.info/Elements/w/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618">
    <rdaw:P10086 xml:lang="pt">Lórax (Beber)</rdaw:P10086>
    <rdaw:P10086 xml:lang="af">Loraks</rdaw:P10086>
    <rdaw:P10086 xml:lang="ru">Driad</rdaw:P10086>
    <rdaw:P10086 xml:lang="es">Lórax</rdaw:P10086>
  </rdf:Description>
<rdf:RDF>

Map:

@prefix bf: <http://id.loc.gov/ontologies/bibframe/>.
@prefix ex: <http://example.org/rules/>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.

ex:WorkMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source "data.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/RDF/Description"
  ].

  ex:WorkMap rr:subjectMap [
    rml:reference "@about";
    rr:class bf:Work
  ].

  ex:WorkMap rr:predicateObjectMap [
      rr:predicate bf:title;
      rr:objectMap [
        rr:parentTriplesMap ex:VariantTitleMap
      ]
    ].

ex:VariantTitleMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source "data.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/RDF/Description"
  ].

  ex:VariantTitleMap rr:subjectMap [
    rr:termType rr:BlankNode;
    rr:class bf:VariantTitle
  ].

  ex:VariantTitleMap rr:predicateObjectMap [
    rr:predicate bf:mainTitle;
    rr:objectMap [
      rml:reference "P10086[@lang]";
      rr:termType rr:Literal;
      rml:languageMap [
        rml:reference "P10086/@lang"
      ]
    ]
  ].

Output:

@prefix bf: <http://id.loc.gov/ontologies/bibframe/>.

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:0 .

_:0 a bf:VariantTitle;
  bf:mainTitle "Driad"@pt, "Loraks"@pt, "Lórax"@pt, "Lórax (Beber)"@pt .

As you can see, these values should be labeled as Russian, Afrikaans, Spanish, and Portuguese, but they've all been labeled as Portuguese. Is there something I can do to the XPath expression in rml:languageMap to prevent this?

thomas-delva commented 4 years ago

Hi,

Unfortunately, objectmaps which generate more than one term and languagemaps do not interplay very well in the RMLMapper (as of now). I can propose two workarounds for this issue, both based on changing the iterator so the objectmap only generates one term.

Workaround 1: one blank node per VariantTitle

If you change the iterator of ex:VariantTitleMap to go over each P10086 node separately (/RDF/Description/P10086[@lang]), on each iteration there is only one literal and one language tag, leading to correct behaviour.

This way four separate VariantTitle blank nodes are created:

@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  a bf:Work;
  bf:title _:0 .

_:0 a bf:VariantTitle;
  bf:mainTitle "Lórax (Beber)"@pt .

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:1 .

_:1 a bf:VariantTitle;
  bf:mainTitle "Loraks"@af .

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:2 .

_:2 a bf:VariantTitle;
  bf:mainTitle "Driad"@ru .

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  bf:title _:3 .

_:3 a bf:VariantTitle;
  bf:mainTitle "Lórax"@es .

(The full mapping file are attached as mapping1.ttl.)

Workaround 2: one shared IRI for all VariantTitles

If you need the four mainTitle properties to have the same subject, you can achieve this by generating the same IRI for the four subjects.

This can be achieved by using the same iterator as in the previous workaround, and with the following subjectmap for ex:VariantTitleMap:

  rr:subjectMap [
    rr:template "ex:VariantTitle/{../@about}";
    rr:class bf:VariantTitle
  ];

This reference ../@about walks up (..) from the P10086 nodes to the shared Description node, and generates an IRI from the latter's properties.

(The full mappings are attached as mapping2.ttl.)

This leads to the following output:

@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .

<ex:VariantTitle/https%3A%2F%2Ftrellis.sinopia.io%2Frepository%2Fwashington%2Fd0d9f78e-05f1-4594-bdcb-b396ce68f618>
  a bf:VariantTitle;
  bf:mainTitle "Driad"@ru, "Loraks"@af, "Lórax"@es, "Lórax (Beber)"@pt .

<https://trellis.sinopia.io/repository/washington/d0d9f78e-05f1-4594-bdcb-b396ce68f618>
  a bf:Work;
  bf:title <ex:VariantTitle/https%3A%2F%2Ftrellis.sinopia.io%2Frepository%2Fwashington%2Fd0d9f78e-05f1-4594-bdcb-b396ce68f618> .

This output looks more like the output you gave, but obviously the IRI <ex:VariantTitle/https...> is meaningless. So you should either replace it again with a blank node in a post-processing step, or use a function in RML to generate a more meaningful IRI.

mcm104 commented 4 years ago

Thank you for your response! I'm not seeing the attachments you mentioned -- could you please post those again?

thomas-delva commented 4 years ago

Ah yes of course, the attachments are these:

data.txt mapping1.txt mapping2.txt output1.txt output2.txt

(Github does not support sharing .ttl files, so the extensions had to be changed to .txt.)

mcm104 commented 4 years ago

Thank you! This is very helpful!