RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
147 stars 61 forks source link

CSV handling not working when empty column header encountered #124

Closed mvanbrab closed 2 years ago

mvanbrab commented 3 years ago

Handling this issue.csv

ColumnA,,ColumnC
s,,o

with this YARRRML for example

prefixes:
  ex: "http://example.com/"

mappings:
  test:
    sources:
      - ['issue.csv~csv']
    s: ex:$(ColumnA)
    po:
      - [ex:something, ex:$(ColumnC)~iri]

gives an empty output.

pheyvaer commented 3 years ago

Do you get errors when you turn on debugging?

valentinoli commented 3 years ago

I'm not an expert but could it be due to the ~iri? Isn't that a datatype declaration? I don't think the value o in Column C is a valid IRI

mvanbrab commented 3 years ago

@pheyvaer Was using Matey... Output from commandline when adding -v option:

08:58:56.277 [main] DEBUG b.ugent.rml.records.CSVRecordFactory.getParserForNormalCSV(89) - Could not parse CSV inputstream
java.lang.IllegalArgumentException: A header name is missing in [ColumnA, , ColumnC]
    at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:501)
    at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:412)
    at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:378)
    at org.apache.commons.csv.CSVParser.parse(CSVParser.java:279)
    at org.apache.commons.csv.CSVParser.parse(CSVParser.java:234)
    at be.ugent.rml.records.CSVRecordFactory.getParserForNormalCSV(CSVRecordFactory.java:87)
    at be.ugent.rml.records.CSVRecordFactory.getRecords(CSVRecordFactory.java:46)
    at be.ugent.rml.records.RecordsFactory.getRecords(RecordsFactory.java:136)
    at be.ugent.rml.records.RecordsFactory.createRecords(RecordsFactory.java:70)
    at be.ugent.rml.Executor.getRecords(Executor.java:409)
    at be.ugent.rml.Executor.executeWithFunctionV5(Executor.java:153)
    at be.ugent.rml.Executor.executeV5(Executor.java:140)
    at be.ugent.rml.cli.Main.main(Main.java:318)
    at be.ugent.rml.cli.Main.main(Main.java:40)
08:58:56.280 [main] DEBUG be.ugent.rml.cli.Main               .writeOutputTargets(353) - Writing to Targets: [<rmlmapper://default.store>]
08:58:56.282 [main] DEBUG be.ugent.rml.cli.Main               .writeOutputTargets(368) - Exporting to default Target
08:58:56.283 [main] INFO  be.ugent.rml.cli.Main               .writeOutputUncompressed(470) - 0 quad was generated for default Target
08:58:56.283 [main] INFO  be.ugent.rml.cli.Main               .writeOutputTargets(396) - No results!
mvanbrab commented 3 years ago

@valentinoli ex:o is a valid iri.

pheyvaer commented 3 years ago
A header name is missing in [ColumnA, , ColumnC]

This is the problem. The header of the CSV file is invalid.

mvanbrab commented 3 years ago

Consider it a feature request to make the mapper robust against the absence of a header value for a column. Skip that column in such cases. Useful in cases where the CSV file comes from an external source, not under own control.

pheyvaer commented 3 years ago

Is it allowed to have an empty header value according to the CSV spec?

bjdmeest commented 3 years ago

Is it allowed to have an empty header value according to the CSV spec?

Totally allowed https://datatracker.ietf.org/doc/html/rfc4180#section-2

pheyvaer commented 3 years ago

😭 Ok, then this is a bug in the RMLMapper.

DylanVanAssche commented 2 years ago

A fix will be available in the next release which will ignore columns without a name in the CSV header.

DylanVanAssche commented 2 years ago

This was released in 4.13.0, closing.