RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
148 stars 61 forks source link

Generating classes for empty blank nodes #56

Closed mcm104 closed 3 years ago

mcm104 commented 4 years ago

Our mapping requires the use of a lot of blank nodes, and each of those blank nodes is classed in some way. For example:

ex:WorkMonographMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source "/exampleData.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/RDF/Description"
  ].

  ex:WorkMonographMap rr:subjectMap [
    rml:reference "@about";
    rr:class bf:Work
  ].

    ex:WorkMonographMap rr:predicateObjectMap [
    rr:predicate bf:title;
    rr:objectMap [
      rr:parentTriplesMap ex:VariantTitleMap
    ]
  ].

ex:VariantTitleMap a rr:TriplesMap;
  rml:logicalSource [
    rml:source "/exampleData.xml";
    rml:referenceFormulation ql:XPath;
    rml:iterator "/RDF/Description"
  ].

  ex:VariantTitleMap rr:subjectMap [
    rr:termType rr:BlankNode;
    rr:class bf:VariantTitle
  ].

  ex:VariantTitleMap rr:predicateObjectMap [
    rr:predicate bf:mainTitle;
    rr:objectMap [
      rml:reference "P10086"
    ]
  ].

This works out great most of the time, but if we use this mapping on a record that (in this example) doesn't have a variant title, then we end up with a blank node like this:

<record>
  bf:title _:5 .

_:5 a bf:VariantTitle .

that only contain the class for an otherwise empty blank node. Is there any way to prevent these unnecessary blank nodes from generating?

thomas-delva commented 4 years ago

Hi @mcm104 ,

I am not certain if it is possible to add a check for this in the RML mappings. However, it is possible to use a conditional XPath expression as iterator for the second triples map: rml:iterator "/RDF/Description[P10086]". This iterator will only go over Description nodes if they contain a child node P10086.

Below are two example XML files and their outputs with this iterator. Please let us know if this solution works for your use case.

Input 1: property present

<?xml version="1.0" encoding="UTF-8"?>

<RDF
   xmlns:ns2="http://rdaregistry.info/Elements/w/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <Description rdf:about="https://trellis.sinopia.io/repository/washington/1a09069d-8165-42ae-bfc0-f6b269b4d34b">
    <ns2:P10086 xml:lang="en">Beyond Einstein</ns2:P10086>
  </Description>
</RDF>

Output 1

<https://trellis.sinopia.io/repository/washington/1a09069d-8165-42ae-bfc0-f6b269b4d34b> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://id.loc.gov/ontologies/bibframe/Work>.
<https://trellis.sinopia.io/repository/washington/1a09069d-8165-42ae-bfc0-f6b269b4d34b> <http://id.loc.gov/ontologies/bibframe/title> _:0.
_:0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://id.loc.gov/ontologies/bibframe/VariantTitle>.
_:0 <http://id.loc.gov/ontologies/bibframe/mainTitle> "Beyond Einstein".

Input 2: property not present

<?xml version="1.0" encoding="UTF-8"?>

<RDF
   xmlns:ns2="http://rdaregistry.info/Elements/w/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <Description rdf:about="https://trellis.sinopia.io/repository/washington/1a09069d-8165-42ae-bfc0-f6b269b4d34b">
    <ns2:P10223 xml:lang="en">Beyond Einstein</ns2:P10223>
  </Description>
</RDF>

Output 2

<https://trellis.sinopia.io/repository/washington/1a09069d-8165-42ae-bfc0-f6b269b4d34b> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://id.loc.gov/ontologies/bibframe/Work>.