RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
151 stars 61 forks source link

Using RegExp in grel:string_replace function #190

Open tduval-unifylogic opened 2 years ago

tduval-unifylogic commented 2 years ago

I'm hoping you can provide me some guidance on using a regular expression as a parameter to a function to replace (more like substitute) the value of a literal object using a regular expression.

Particularly "Name" in:

  rr:subjectMap [
    rr:template "http://example.com/{Name}"
  ];

I need to slug the "Name" value so the string is URI safe according to our standards. e.g., if value of Name was "Hey Mister Tally M#n" we need the string to be slug'd to "Hey-Mister-Tally-M-n" (we have the regexp that does the conversion properly)

To start, I updated the Name value in student.csv file to "Hey Mister Tally M#n" and attempted to use grel:string_replace the object of foaf:Name with no success. Any help is greatly appreciated. It would also appreciate guidance on how would I apply the function to {Name} in rr:subjectMap [ rr:template ]

The result I get from the rule/function:

{"message":"Error while executing the rules.","log":"23:02:42.082 [main] ERROR be.ugent.rml.cli.Main               .main(404) - null\n","stack":"Error: Error while executing the rules.\n    at ChildProcess.<anonymous> (/rmlmapper-webapi-js/node_modules/@rmlio/rmlmapper-java-wrapper/lib/wrapper.js:170:23)\n    at ChildProcess.emit (events.js:400:28)\n    at maybeClose (internal/child_process.js:1088:16)\n    at Process.ChildProcess._handle.onexit (internal/child_process.js:296:5)"}

Here is what I have for a mapping.ttl:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix fnml:   <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno:   <https://w3id.org/function/ontology#> .
@prefix grel:   <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .

@base <http://example.com/base/> .

<TriplesMap1>
  a rr:TriplesMap;

  rml:logicalSource [
    rml:source "../student.csv";
    rml:referenceFormulation ql:CSV
  ];

  rr:subjectMap [
    rr:template "http://example.com/{Name}"
  ];

  rr:predicateObjectMap [
    rr:predicate foaf:name;
    rr:objectMap [
      fnml:functionValue [
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap [ rr:constant grel:string_replace ] ] ;
        rr:predicateObjectMap [
            rr:predicate grel:valueParameter ;
            rr:objectMap [ rr:template "Name" ] ] ;
        rr:predicateObjectMap [
            rr:predicate grel:p_string_find  ;
            rr:objectMap [ rr:constant "[^a-zA-Z0-9_\-\.\+]+" ] ] ;
        rr:predicateObjectMap [
            rr:predicate grel:p_string_replace  ;
            rr:objectMap [ rr:constant "-"  ] ] ;                      
      ]
    ]
  ].
johndoe888 commented 1 year ago

When you can change your input to XML, you can use XPath (which I think is very powerful) with RML version 6.

Here is the XML data:

<?xml version="1.0" encoding="UTF-8"?>

<A1 name="H# M# T.-_ M."/>

Here is the mapping:

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ex: <http://example.com/ns#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.

@base <http://example.com/data/>.

<#A1> a rr:TriplesMap;
  rml:logicalSource [
    rml:source "./test.xml" ;
    rml:iterator "/A1";
    rml:referenceFormulation ql:XPath;
  ];

  rr:subjectMap [
    rr:template "http://example.com/data/{replace(@name, '[^a-zA-Z0-9_.]+', '-')}";
    rr:class ex:A1
  ].

'+' is problematic as it would be encoded. You might have to post process that. I have not found a way around that.