RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

relative URLs not consistently supported #178

Closed bblfish closed 1 year ago

bblfish commented 2 years ago

There are a number of inconsistencies relating to relative URLs. The spec has many examples starting with

<#TriplesMap1> a rr:TriplesMap;

But even setting the relative URL on the command line did not fix that.

Using this example CSV

Id,Name,DoB,Sex,mother
1,Linus,02-07-2016,male,4
2,Oliver,02-07-2016,male,4
3,Anaïs,10-09-2014,female,4
4,Gordana,30-05-1982,female,

I would like the following to work. The rml:source "pplEx.csv" works but not the rr:template$r_{mother}` . Allowing predicates with relative URLs would

  1. be consistent
  2. allow one to better explain to people working with json and csv what their format is actually saying: namely they are specifying relations but ones that are completely tied to the document in which they are located. (unless a mime type that goes beyond plain json specifies a global interpreation for all the vocabulary items in the string). One can then show how adding a global namespace gives one a way to equate the relations across various documents
  3. useful if one is mapping to locally defined terms
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

# not sure why I can't use a relative url here with rmlmapper
[] a rr:TriplesMap;
    rml:logicalSource [
        rml:source "pplEx.csv";
        rml:referenceFormulation ql:CSV
    ];
    rr:subjectMap [
      rr:template "#r_{Id}";
      rr:datatype xsd:int;
    ];
    rr:predicateObjectMap [
       rr:predicate foaf:name;
       rr:objectMap [
        rml:reference "Name";
        rr:datatype xsd:string
    ] ];
    rr:predicateObjectMap [
        rr:predicate  <dateBirth>;
        rr:objectMap [
            rml:reference "DoB";
            rr:datatype xsd:string
    ] ];
    rr:predicateObjectMap [
      rr:predicate <http://sparql.cwrc.ca/ontologies/cwrc#hasMother>;
      rr:objectMap [
        rr:template "#r_{mother}"
      ] ];
.
bblfish commented 2 years ago

Note that relative URLs do work well with CSVW implementation https://github.com/Swirrl/csv2rdf

{ 
  "@context": [ "http://www.w3.org/ns/csvw", { "@language": "en"} ],
  "dc:title": "example people data",
  "tables": [{
      "url": "pplex.csv",
      "tableSchema": {
          "@id" : "http://example.com/",
          "aboutUrl": "#{Id}",
          "columns": [
            {
               "name": "Id"
             }, {
               "name": "Name",
               "datatype": "string"
            },  {
               "name": "DoB",
               "datatype": {
                 "base": "date",
                 "format": "dd-MM-yyyy"
               }
             }, {
               "name": "Sex",
               "datatype": "string"
            }, {
               "name": "mother",
          "valueUrl": "#{mother}"
             } ],
          "primaryKey":"Id",  
          "foreignKeys": [{
               "columnReference": "mother",
               "reference": {
               "resource": "pplex.csv",
                   "columnReference": "Id"
               }
          }]
      }
  }]
}
pheyvaer commented 2 years ago

Hi @bblfish

It's not really clear to me what the issue is. Can you explicitly write down the steps that you take with their input? Thanks!

bblfish commented 2 years ago
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.

# not sure why I can't use a relative url here with rmlmapper
<#tm> a rr:TriplesMap;
    rml:logicalSource [
        rml:source "pplEx.csv";
        rml:referenceFormulation ql:CSV
    ];
    rr:subjectMap [
      rr:template "#r_{Id}";
      rr:datatype xsd:int;
    ];
    rr:predicateObjectMap [
       rr:predicate foaf:name;
       rr:objectMap [
        rml:reference "Name";
        rr:datatype xsd:string
    ] ] .

Having <#hello> as the relative URL of the triple map does not work.

 java -jar /Users/hjs/.m2/repository/be/ugent/rml/rmlmapper/6.0.0/rmlmapper-6.0.0-r363-all.jar -b http://example.org/ -m pplEx.csv-rml.ttl
11:57:00.222 [main] ERROR be.ugent.rml.cli.Main               .main(254) - Unable to parse mapping rules as Turtle. Does the file exist and is it valid Turtle?
org.eclipse.rdf4j.rio.RDFParseException: Not a valid (absolute) IRI: #tm [line 11]
    at org.eclipse.rdf4j.rio.helpers.RDFParserHelper.reportFatalError(RDFParserHelper.java:366)
    at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.reportFatalError(AbstractRDFParser.java:750)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.reportFatalError(TurtleParser.java:1313)
    at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.createURI(AbstractRDFParser.java:407)
    at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.resolveURI(AbstractRDFParser.java:385)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseURI(TurtleParser.java:943)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseValue(TurtleParser.java:575)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseSubject(TurtleParser.java:406)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseTriples(TurtleParser.java:347)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseStatement(TurtleParser.java:216)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:178)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:130)
    at be.ugent.rml.store.RDF4JStore.read(RDF4JStore.java:120)
    at be.ugent.rml.cli.Main.main(Main.java:251)
    at be.ugent.rml.cli.Main.main(Main.java:45)
Caused by: java.lang.IllegalArgumentException: Not a valid (absolute) IRI: #tm
    at org.eclipse.rdf4j.model.impl.SimpleIRI.setIRIString(SimpleIRI.java:74)
    at org.eclipse.rdf4j.model.impl.SimpleIRI.<init>(SimpleIRI.java:63)
    at org.eclipse.rdf4j.model.impl.AbstractValueFactory.createIRI(AbstractValueFactory.java:86)
    at org.eclipse.rdf4j.rio.helpers.AbstractRDFParser.createURI(AbstractRDFParser.java:405)
    ... 11 common frames omitted

If I replace <#tm> with a blank node [] then it works.

That was the first issue I came across.

DylanVanAssche commented 2 years ago

You probably miss the @base statement for relative IRIs of TriplesMaps since this should definitely work.

bblfish commented 2 years ago

You probably miss the @base statement for relative IRIs of TriplesMaps since this should definitely work.

If not specified the parser should use the file:// url as a base.

note: This is how HTML editing works. You first write html to your filesystem, look at how it looks in the browser, and then publish. Here we are trying to do the same: write mappings locally and test them out, and then decide what the global url is going to be. If @base has to be added to the file then the editor of the rdf has to add a base with the right name to each file.

Note my solution in my Solid Web server to allow local namespaceing to work is to have symlins to the default rdf representation. https://github.com/co-operating-systems/Reactive-SoLiD

bblfish commented 2 years ago

There is also a problem with how the relative URLs of templates are resolved. See https://github.com/kg-construct/rml-questions/issues/23

bblfish commented 2 years ago

Ok, I got it. The -b argument gives a relative URL for the templates urls! https://github.com/kg-construct/rml-questions/issues/23#issuecomment-1188689989

As argued above, I think that would work as well if you generated relative URLs for the templates.

Perhaps this is actually worth emphasizing in the docs or README. It is unusual that one has to deal with two relative URLs when working with RDF:

DylanVanAssche commented 1 year ago

So this issue is solved?

bblfish commented 1 year ago

Can't remember anymore. Did you update the README to make this clearer? Have been working on other things in the meantime.

DylanVanAssche commented 1 year ago

It is already in the README under the command line arguments.