Open reaver2oo3 opened 4 years ago
Hi
I have the same problem. Cells with double quoted strings that contain \n
inside are parsed incorrectly.
id, text
1, "abc
efg"
I have fixed this localy and it seems to be working.
Problem is in function LineReader::next_line
, line 465, link.
Replace original block, lines 465-468 with this:
int line_end = data_begin;
bool is_in_string = false;
bool has_quote = false;
while(line_end != data_end){
if (buffer[line_end] == '\"') {
if (is_in_string)
has_quote = !has_quote;
else
is_in_string = true;
}
else if (buffer[line_end] == '\n') {
if (!is_in_string)
break;
}
else {
if (is_in_string && has_quote) {
is_in_string = false;
has_quote = false;
}
}
++line_end;
}
This code does not consider other quote_policies. This needs to be done correctly.
I'm using csv reader with double_qoute_escape policy like this:
CSVReader<2, trim_chars<>, double_quote_escape<',', '\"'>> csv(file);
and everything works for me.
The issue with \n in quoted strings has been raised a lot in the past. This is known. The summary of the problems are:
Up to now I have not yet seen a good solution.
We are processing many csv files downloaded from various webservices and/or generated by tools.
The \n
inside the csv string is absolutely common, with no escaping. I think, the only escape used in csv files, is double double-quote ""
to escape double-quote.
The problem of runaway due to missing closing double-quote is not a problem of parser, it is problem of generator, this should not be corrected by the parser.
Solving this by "correcting" csv before reading by escaping all multi-line records with some escape and then after reading unescape all escaped new lines is fairly inefficient and complicated.
Muti-line cells are absolutely required for my use case. You cannot tell users that they can't put multiple lines in text boxes.
Then you have to either escape new lines by replaceing them somehow, not use CSV, or look for a different library.
On 6/30/22 17:59, Exceter007 wrote:
Muti-line cells are absolutely required for my use case. You cannot tell users that they can't put multiple lines in text boxes.
— Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/92#issuecomment-1171398473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IBDZKP2LDVFHPJ4IRBCDVRW77ZANCNFSM4KMWN6KA. You are receiving this because you commented.Message ID: @.***>
Hello,
I have integrated your library into an application that I am working on and what I have found is that if a cell has text in it that is spread into multiple lines, then the application will crash. :( Is there any hope for a fix? A possible solution for this, that I have thought of is that, if a cell starts with a quote but it isn't found by the end of the line, then more lines should be read until the pair is found. I haven't managed to write a fix attempt yet, as I haven't looked into your code enough.
Kind regards, Daniel