RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

Postgres, table with Float datatype = NULL gives exception #221

Open MathiasVDA opened 8 months ago

MathiasVDA commented 8 months ago

Hello

I have a postgresql database with a table that I'm trying to convert into triples using RML and RMLmapper. But I get the following exception:

C:\Users\VBP8501\Git\Knowledge graph\rinf>docker run --rm -v "C:\Users\VBP8501\Git\Knowledge graph\rinf":/data rmlio/rmlmapper-java -m src/tunnel2.rml.ttl -m database_amdb.ttl -o output/tunnel2.ttl -s turtle
14:20:03.848 [main] ERROR be.ugent.rml.cli.Main               .run(416) - Cannot invoke "String.endsWith(String)" because "data" is null
14:20:03.852 [main] ERROR be.ugent.rml.cli.Main               .run(453) - Cannot invoke "String.endsWith(String)" because "data" is null
java.lang.NullPointerException: Cannot invoke "String.endsWith(String)" because "data" is null
        at be.ugent.rml.access.RDBAccess.normalizeData(RDBAccess.java:351)
        at be.ugent.rml.access.RDBAccess.getCSVInputStream(RDBAccess.java:198)
        at be.ugent.rml.access.RDBAccess.getInputStream(RDBAccess.java:114)
        at be.ugent.rml.records.CSVRecordFactory.getRecordsForCSV(CSVRecordFactory.java:145)
        at be.ugent.rml.records.CSVRecordFactory.getRecords(CSVRecordFactory.java:77)
        at be.ugent.rml.records.RecordsFactory.getRecords(RecordsFactory.java:144)
        at be.ugent.rml.records.RecordsFactory.createRecords(RecordsFactory.java:75)
        at be.ugent.rml.Executor.getRecords(Executor.java:361)
        at be.ugent.rml.Executor.executeWithFunction(Executor.java:136)
        at be.ugent.rml.Executor.execute(Executor.java:123)
        at be.ugent.rml.cli.Main.run(Main.java:412)
        at be.ugent.rml.cli.Main.main(Main.java:46)
(base) PS C:\Users\VBP8501\Git\Knowledge graph\rinf> 

I have studied that table and created a series of views that reduced the number of columns in the table until RMLMapper worked properly. I found out that when a column has datatype float8 and at least one value is NULL, then the above exception is thrown.

This the postgresql version of the database: PostgreSQL 12.14 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

DylanVanAssche commented 8 months ago

Hi!

Do you maybe have a simple example we can use to debug this?

MathiasVDA commented 8 months ago

Here is an example. Run this query on your database:

create table sandbox.rml_test (
    id serial4 NOT NULL,
    label varchar(255) NULL,
    this_is_the_problem float8 NULL
    );
INSERT INTO sandbox.rml_test
("label")
VALUES('label1'),('label2'),('label3');

And use this rml file as input for the mapper

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr:     <http://www.w3.org/ns/r2rml#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .

<https://data.infrabel.be/data/Source/AMDB/JDBC> a d2rq:Database;
  d2rq:jdbcDSN "...";
  d2rq:jdbcDriver "drivers/org.postgresql.Driver";
  d2rq:username "...";
  d2rq:password "..." .

<https://data.infrabel.be/mapping/rml_test_map>
  a rr:TriplesMap;

  rml:logicalSource [
    rml:source <https://data.infrabel.be/data/Source/AMDB/JDBC>;
    rr:sqlVersion rr:SQL2008;
    rr:tableName "sandbox.rml_test"; ];

  rr:subjectMap [
    rr:template "https://data.infrabel.be/rml_test_{id}"; ];

  rr:predicateObjectMap [
    rr:predicate rdfs:label;
    rr:objectMap [ rml:reference "label" ] ].

And run this command: docker run --rm -v "%cd%":/data rmlio/rmlmapper-java -m path_to_mapping_file -o path_to_output_file -s turtle

DylanVanAssche commented 8 months ago

We will have a look, thanks for the example! Normally, it should handle the float8 thing as datatype according to https://github.com/RMLio/rmlmapper-java/blob/master/src/main/java/be/ugent/rml/access/RDBAccess.java#L286C1-L286C27 but we will have to investigate further what is going on.

MathiasVDA commented 8 months ago

It might be more related to the NULL value then to the data type. I haven't tested other dataypes with a NULL value