Closed dvdscripter closed 7 years ago
Hi @dvdscripter (sorry for the delay) ,
You are right CSVReader
is not rfc4180 compliant and ideally it should.
I'm not sure if we will maintain CSVReader
considering some libraries can handle that easily.
Recently we added TabularFiles which simply wraps Super CSV. I`m not sure to which extent the IPT can use it.
cc @kbraak
Thanks @cgendreau, hope @kbraak can comment at this issue too. Anyway, thanks for taking time to look at this matter.
Thanks David. I don't have anything more to add to what Christian explained. In case it helps, you can send users this FAQ explaining how the IPT supports multiline fields.
This is now considered fixed (in this project).
see test testCsvMultiline
in TabularDataFileReaderTest
around https://github.com/gbif/gbif-common/blob/master/src/test/java/org/gbif/utils/file/tabular/TabularDataFileReaderTest.java#L68
Reading a CSV file while using Next() will fail to read any field with \n inside. While is acceptable most softwares are using https://tools.ietf.org/html/rfc4180 recommendations.
LibreOffice and Excel seems to accept \n if field is quoted.
Also you should skip empty CSV lines instead of just when row.length() == 0 is true:
and
are valid empty lines with two fields.
Can you add support to this? IPT use this class to read input source data and some users are complaining.
I'm reporting here because another gbif tool can show the same behavior.