Open youribonnaffe opened 7 years ago
@youribonnaffe Thank you for reporting this problem. From code and example it seems to me this should just work as is.
Just one question: which version of Jackson are you using? Latest stable versions are 2.9.0 / 2.8.9.
I'm using 2.8.9
Thank you for confirming. That sounds odd as I am pretty sure this functionality has been around and tested for a long time.
Hmmh. Actually, I am not sure this is a bug after all.
The problem is that the first double-quote is taken to mean that the column value is quoted. This leaves the second quote, which is taken as the end quote because it is NOT doubled -- for proper behavior here, there should be 3 double-quotes, which would be interpreted as expected. So it would seem like code that generated this CSV did not handle this aspect properly, based on my understanding of CSV.
Having said that, CSV "specification" is quite loose, as there isn't really an official specification. So I would be interested in finding if something was said of this behavior. It is possible that I have not considered some corner case.
Ok, reading RFC 4180, I see:
5. Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
which I think spells out why the test case is invalid -- field must be quoted (as it contains double-quotes itself) and each double-quote within must be doubled itself.
I agree, the value is probably malformatted according to the RFC. Still do you think there is an interest to support such usage if that could be done without breaking the existing implementation?
@youribonnaffe if that could be supported (perhaps via optional CsvParser.Feature
), that could be useful. I have no objections to such support.
I have a CSV file with the following content (just a limited extract here):
Parsing this CSV content with CsvMapper causes the following error:
Here is a unit test to reproduce the issue:
Is there a way to configure the parser to be more flexible about this usage of quotes? Unfortunately the CSV file is not under my control and I won't be able to change it's format.
Parsing this file with OpenCSV was working but I was hoping to switch to Jackson for better performances.