RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

empty cells from CSV input get not skipped anymore #140

Closed lpmeyer closed 2 years ago

lpmeyer commented 2 years ago

RmlMapper seems to no longer skip triples with empty values from CSV input since version 4.12, resulting in triples like subject predicate "". I am not sure about the rml spec, but I would expect rml to skip rdf triples for empty input as it was RmlMappers behaviour with version 4.11.

This seems to be caused by a change in src/main/java/be/ugent/rml/records/CSVRecord.java of the commit https://github.com/RMLio/rmlmapper-java/commit/17c841ede06b775f29e5d165ba14dbfc38472ff0

The following patch fixed the problem for me:

diff --git a/src/main/java/be/ugent/rml/records/CSVRecord.java b/src/main/java/be/ugent/rml/records/CSVRecord.java
index 3c0aabb..065da14 100644
--- a/src/main/java/be/ugent/rml/records/CSVRecord.java
+++ b/src/main/java/be/ugent/rml/records/CSVRecord.java
@@ -64,8 +64,8 @@
         List<Object> result = new ArrayList<>();
         Object obj = this.record.get(toDatabaseCase);

-        // needed for finding NULL in CSV serialization
-        if (obj != null) {
+        // do not add NULL or empty CSV cell serialization
+        if (obj != null && !String.valueOf(obj).equals("")) {
             result.add(obj);
         }

example sparseInput.csv with a missing value in column 'B':

A,B,C
1,,3
4,5,6

example rml:

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix : <http://mapping.example.com/> .
@prefix ex: <http://example.com/> .

:rules_000 rdf:type void:Dataset ;
    void:exampleResource :map_test_000 .

:map_test_000 rml:logicalSource :source_000 ;
    rdf:type rr:TriplesMap ;
    rdfs:label "test" ;
    rr:subjectMap :s_000 ;
    rr:predicateObjectMap :pom_000, :pom_001 .

:source_000 rdf:type rml:LogicalSource ;
    rml:source "sparseInput.csv" ;
    rml:referenceFormulation ql:CSV .

:s_000 rdf:type rr:SubjectMap ;
    rr:template "http://example.com/{A}" .

:pom_000 rdf:type rr:PredicateObjectMap ;
    rr:predicateMap :pm_000 ;
    rr:objectMap :om_000 .

:pm_000 rdf:type rr:PredicateMap ;
    rr:constant ex:A .

:om_000 rdf:type rr:ObjectMap ;
    rml:reference "A" ;
    rr:termType rr:Literal .

:pom_001 rdf:type rr:PredicateObjectMap ;
    rr:predicateMap :pm_001 ;
    rr:objectMap :om_001 .

:pm_001 rdf:type rr:PredicateMap ;
    rr:constant ex:B .

:om_001 rdf:type rr:ObjectMap ;
    rml:reference "B" ;
    rr:termType rr:Literal .

output from rmlmapper 4.12 and 4.13, containing the line ex:B "" .:

@prefix ex: <http://example.com/> .

ex:1 ex:A "1";
  ex:B "" .

ex:4 ex:A "4";
  ex:B "5" .

output as expected from rmlmapper v4.10 and v4.11:

@prefix ex: <http://example.com/> .

ex:1 ex:A "1" .

ex:4 ex:A "4";
  ex:B "5" .
DylanVanAssche commented 2 years ago

@lpmeyer Thanks for your report! This shouldn't happen indeed, we will have a look at the issue.

DylanVanAssche commented 2 years ago

This was supposed to be fixed in 4.14.X but the fix didn't make it into the release. Will make sure that it is fixed in the next release!

DylanVanAssche commented 2 years ago

@lpmeyer I pushed some commits to the development branch which should fix this issue. Can you please verify your issue is fixed by building the RMLMapper from the development branch?

git checkout development
mvn install -DskipTest=true
java -jar target/rmlmapper-4.14.3-r*-all.jar -m mapping.rml.ttl
lpmeyer commented 2 years ago

@DylanVanAssche Thanks for fixing! The output from current development version looks good as expected. I am looking forward to the next release!

DylanVanAssche commented 2 years ago

@lpmeyer Great that it works! Expect a release this week somewhere ;)

DylanVanAssche commented 2 years ago

Fixed in 4.15.0, closing.