RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

rr:template doesn't consistently percent-encode #200

Closed sixdiamants closed 1 year ago

sixdiamants commented 1 year ago

let my CSV be

PAD1;PAD2;ELSYS;laenge;richtwinkel
"6624AO 1101";"6624AO 1102";83.42;267.18;

Note the presence of spaces in PAD1, PAD2 values.
A template string rr:template "{PAD1}_{PAD2}" should percent-encode values like so 6624AO%201102_6624AO%201103

However, when the template string is used to produce a blank node label in the object map, the template fails to percent-encode: rr:objectMap [ rr:template "length{PAD1}_{PAD2}"; rr:termType rr:BlankNode; ] produces _:length6624AO 1101_6624AO 1102

Here's the mapping

@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@base <http://example.com/base/> .

<TriplesMap1> a rr:TriplesMap;

  rml:logicalSource <#mijnbron>;

  rr:subjectMap [ 
    rr:template "http://example.com/{PAD1}_{PAD2}";
    rr:class <#ElementLage>
  ];

  rr:predicateObjectMap [
    rr:predicate foaf:id;
    rr:objectMap [rr:template "{PAD1}_{PAD2}"]
  ];

  rr:predicateObjectMap [ 
    rr:predicate "length" ; 
    rr:objectMap [ rr:template "length{PAD1}_{PAD2}"; rr:termType rr:BlankNode; ]
  ].

<TriplesMap2> a rr:TriplesMap;
    rml:logicalSource <#mijnbron>;

    rr:subjectMap [
        rr:template "length{PAD1}_{PAD2}"; 
        rr:termType rr:BlankNode;
        rr:class <#Length>
    ]
    .

<#mijnbron> a rml:logicalSource; 
    rml:source <#CSVW_source>;
    rml:referenceFormulation ql:CSV .

<#CSVW_source> a csvw:Table;
   csvw:url "testSpacedColumns.csv" ;
   csvw:dialect [ a csvw:Dialect;
       csvw:delimiter ";"
   ] .
DylanVanAssche commented 1 year ago

The specification says that only when the term type is IRI we should replace them with percent encoding:

If the term type is rr:IRI, then replace the pair of curly braces with an IRI-safe version of value; otherwise, replace the pair of curly braces with value

If you think this should also happen with blank nodes, I would raise an issue at https://github.com/kg-construct/rml-core