RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

Issue with CSV: Unterminated quoted field at end of CSV line #176

Closed tobiasschweizer closed 2 months ago

tobiasschweizer commented 2 years ago

Hi there,

I am trying to map the following CSV file: https://data.snf.ch/Exportcsv/OutputdataScientificPublication.csv

I am using rmlmapper-6.0.0-r363-all.jar (CLI).

Mapping:

@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix schema: <http://schema.org/>.
@prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#lat>.
@prefix gn: <http://www.geonames.org/ontology#>.
@prefix carml: <http://carml.taxonic.com/carml/> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@prefix crml: <http://semweb.mmlab.be/ns/rml/condition#> .
@base <http://example.com/ns#>.

<#LogicalSourcePublication> a rml:BaseSource ;
  rml:source <#CSVW_sourcePublication> ;
  rml:referenceFormulation ql:CSV .

<#CSVW_sourcePublication> a csvw:Table;
   csvw:url "OutputdataScientificPublication.csv" ;
   csvw:dialect [ a csvw:Dialect;
       csvw:delimiter ";"
   ] .

### Publications

<#PublicationMapping> a rr:TriplesMap;
  rml:logicalSource <#LogicalSourcePublication> ;

  rr:subjectMap [
    rr:template "http://snf.ch/publication/{ScientificPublicationId}" ;
    rr:class schema:ScholarlyArticle
  ] .

I get the following error message:

17:14:36.117 [main] ERROR be.ugent.rml.cli.Main .main(404) - Unterminated quoted field at end of CSV line. Beginning of lost text: [ Rutter, G. A. ;;;4182;PubMed;;;4194;;0;Peer-reviewed;;The Journal of clinical investigation;;Pub...]

This is the last line of the CSV.

The strange thing is that I can copy this exact line to a small test file and then it works. OutputdataScientificPublication_test.csv

Is this an OpenCSV issue, see https://stackoverflow.com/questions/70976734/csvmalformedlineexception-unterminated-quoted-field-at-end-of-csv-line and https://stackoverflow.com/questions/70347745/unterminated-quoted-field-at-end-of-csv-line-beginning-of-lost-text? However, the quotes seem ok (opening, closing).

Maybe that is also related to #173 (quoted values).

Thanks a lot for your help.

tobiasschweizer commented 2 years ago

@DylanVanAssche Do you think this could be an OpenCSV lib issue? Maybe the size of the CSV files?

AronBuzogany commented 9 months ago

@tobiasschweizer I get an error opening the link you provided https://data.snf.ch/Exportcsv/OutputdataScientificPublication.csv

Do you still have the failing example?

tobiasschweizer commented 9 months ago

Hi @AronBuzogany,

Thanks for looking into this. Unfortunately, I do not have the original data anymore. In any case, I think the CSV was fine since it worked with CARML.

Here is the link (they changed the portal): https://data.snf.ch/exportcsv/OutputdataScientificPublication.csv

AronBuzogany commented 9 months ago

Thanks for your help @tobiasschweizer . I have just executed your current data with the current version of rmlmapper in development and everything seems to work fine. In any case, the error your issues reports hasn't occurred with the data.

tobiasschweizer commented 9 months ago

Back then, I had the impression that it could be related to memory. Did you update the OpenCSV version since when I filed the issue?

AronBuzogany commented 9 months ago

Yes, I tested your issue in development branch. Here we no longer use openCSV, but rather a library that uses less memory and is way faster. So this issue will probably be fixed in the new release.

tobiasschweizer commented 9 months ago

That's good news, great!

DylanVanAssche commented 2 months ago

So this issue will probably be fixed in the new release.

We have already released it, so I will close this issue. If you encounter more problems, feel free to create more issues. Thanks!