ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.13k stars 441 forks source link

Lib does not work if cell contains \n characters #92

Open reaver2oo3 opened 4 years ago

reaver2oo3 commented 4 years ago

Hello,

I have integrated your library into an application that I am working on and what I have found is that if a cell has text in it that is spread into multiple lines, then the application will crash. :( Is there any hope for a fix? A possible solution for this, that I have thought of is that, if a cell starts with a quote but it isn't found by the end of the line, then more lines should be read until the pair is found. I haven't managed to write a fix attempt yet, as I haven't looked into your code enough.

Kind regards, Daniel

lubomirmatus commented 4 years ago

Hi

I have the same problem. Cells with double quoted strings that contain \n inside are parsed incorrectly.

id, text
1, "abc
efg"

I have fixed this localy and it seems to be working. Problem is in function LineReader::next_line, line 465, link. Replace original block, lines 465-468 with this:

int line_end = data_begin;
bool is_in_string = false;
bool has_quote = false;
while(line_end != data_end){
        if (buffer[line_end] == '\"') {
            if (is_in_string)
                has_quote = !has_quote;
            else
                is_in_string = true;
        }
        else if (buffer[line_end] == '\n') {
            if (!is_in_string)
                break;
        }
        else {
            if (is_in_string && has_quote) {
                is_in_string = false;
                has_quote = false;
            }
        }
        ++line_end;
}

This code does not consider other quote_policies. This needs to be done correctly.

I'm using csv reader with double_qoute_escape policy like this:

CSVReader<2, trim_chars<>, double_quote_escape<',', '\"'>> csv(file);

and everything works for me.

ben-strasser commented 4 years ago

The issue with \n in quoted strings has been raised a lot in the past. This is known. The summary of the problems are:

Up to now I have not yet seen a good solution.

lubomirmatus commented 4 years ago

We are processing many csv files downloaded from various webservices and/or generated by tools. The \n inside the csv string is absolutely common, with no escaping. I think, the only escape used in csv files, is double double-quote "" to escape double-quote.

The problem of runaway due to missing closing double-quote is not a problem of parser, it is problem of generator, this should not be corrected by the parser.

Solving this by "correcting" csv before reading by escaping all multi-line records with some escape and then after reading unescape all escaped new lines is fairly inefficient and complicated.

johndunlap commented 2 years ago

Muti-line cells are absolutely required for my use case. You cannot tell users that they can't put multiple lines in text boxes.

ben-strasser commented 2 years ago

Then you have to either escape new lines by replaceing them somehow, not use CSV, or look for a different library.

On 6/30/22 17:59, Exceter007 wrote:

Muti-line cells are absolutely required for my use case. You cannot tell users that they can't put multiple lines in text boxes.

— Reply to this email directly, view it on GitHub https://github.com/ben-strasser/fast-cpp-csv-parser/issues/92#issuecomment-1171398473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IBDZKP2LDVFHPJ4IRBCDVRW77ZANCNFSM4KMWN6KA. You are receiving this because you commented.Message ID: @.***>