ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.11k stars 440 forks source link

Fix the case where row[r] is null #79

Closed mcneish1 closed 2 years ago

mcneish1 commented 5 years ago

This patch makes read_row return false in case it reads a null byte in the middle of the row, rather than doing nothing and skipping a parameter (leaving it possibly uninitialized).

Possible alternatives:

ben-strasser commented 5 years ago

Hi,

thanks for the bug report.

From my understanding, the code currently truncates each line that contains a 0-byte at the 0 byte. This means that the following file:

a,b,c 1,2,3 4,5\0,6 7,8,9

is read as if the file content was:

a,b 1,2,3 4,5 7,8,9

This then triggers the io::error::too_few_columns exception.

Is my understanding of the current behavior correct?

I agree that this behavior is not useful. Luckily, I believe that it is sufficiently unlikely that somebody relies on this behavior to require keeping this bug around for backward compatibility.

However, I do not understand what behavior you are proposing. Would you mind elaborating a bit?

My gut feeling is that the "correct" fix would be to use std::string_view and to treat the 0-byte as yet-another-byte. Unfortunately, I do not know whether std::string_view is old enough to be available with every reasonably recent compiler. An alternative would be to introduce a bool next_line(char&begin, char&end) function. The later would also allow to discern between an empty line and the end of the file.

Best Regards Dr. Ben Strasser

mcneish1 commented 5 years ago

Sorry about the delay.

My patch doesn't do quite what I wanted, and has the same behavior as before: I missed the too_few_columns exception, and didn't quite understand rows. What I was trying to do was omit the entire line with a null byte.

My gut feeling is that the "correct" fix would be to use std::string_view and to treat the 0-byte as yet-another-byte.

string_view requires c++17, unfortunately. An alternative here is char begin/char end pairs: parse(char*begin, char*end, T& t), then using std::string's iterator constructor. I got most of the way through implementing it, but I haven't managed to come up with a satisfactory solution for col_order (a second sorted_col_end in template<...> void parse_line(...) is the best I've come up with)


This started with a clang-tidy warning about potentially uninitialized variables returned from parse_helper. Each of the available parse methods behaves the same with a leading null byte, so I think the easiest fix would just be to remove the row[r] check.