ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.12k stars 439 forks source link

Partial lines #41

Closed pengxu7 closed 7 years ago

pengxu7 commented 7 years ago

Hello,

I have a NFS file which is an append-only CSV. Another process keeps adding more lines to the end and my application keeps checking file size of it: when size increases it picks up new lines by calling pread() and uses fast-cpp-csv-parser to parse each line.

But sometimes I only got partial lines, and I believe it's because of this piece of code in next_line() method:" if(buffer[line_end] == '\n'){ buffer[line_end] = '\0'; }else{ // some files are missing the newline at the end of the // last line ++data_end; buffer[line_end] = '\0'; } "

Since the NFS file is a moving target, there might be partial lines when the other process each time writes/saves, but it doesn't mean it's the end of file missing the new line.

Thanks!

ben-strasser commented 7 years ago

Hi,

thanks for the bug report.

What behavior do you need when the file has no terminal newline? Should the line be discarded? Should the reader block until the other process has written the file?

Best Regards Ben Strasser

pengxu7 commented 7 years ago

I don't think blocking till rest of the line is written would be necessary as I'm already polling on file size, neither should the line be discarded. What I did was resetting line_end = line_begin then returning nullptr, so the full line can be picked up next time. But this approach won't work while parsing static CSV files with the new line missing at the end of the file, as it will miss last line.

Do you have better ideas to handle both cases properly?

Thanks.

ben-strasser commented 7 years ago

Hi,

I think implementing the blocking behavior is the easiest.

Derive a class from ByteSourceBase and implement the int read(char*buffer, int size) function. The function should only return if at least one byte was read. If no bytes are currently available but the EOF was not reached, the function should block. Returning 0 signals an EOF.

Best Regards Ben Strasser