ben-strasser / fast-cpp-csv-parser

fast-cpp-csv-parser
BSD 3-Clause "New" or "Revised" License
2.11k stars 440 forks source link

Handle different line endings #58

Closed prasadsilva closed 6 years ago

prasadsilva commented 6 years ago

I ran into an issue attempting to parse the CSV file here: https://github.com/donohoe/nyt-crossword. The file has CR line endings, but this parser only supports LF and CR-LF. I had to run mac2unix on the file to get it to parse. It would be nice to have extra line ending support, or have it be configurable.

See https://en.wikipedia.org/wiki/Newline#Representations

ben-strasser commented 6 years ago

Hi,

thanks for the feedback.

Are there any major platforms out there except classic Mac but not new Mac that use \r newlines?

I do not want to add further config flags and changing the code to interpret a single \r as newline may break existing code that relies on this not being the case. Preprocessor ifdef also do not seem right, as the same application might see different newline types.

I feel that Mac Classic is old enough to be considered exotic at this point.

Either change a few lines in your local csv.h. The relevant ones are in LineReader::next_line. Replace all \n by \r and kill the lines 479 and 480. The other option is to implement a ByteSource that translates all \r to \n. The following should work (not tested):

class OwningStdIOByteSourceBaseWithMacNewlines : public OwningStdIOByteSourceBase{ public: using OwningStdIOByteSourceBase::OwningStdIOByteSourceBase; int read(char*buffer, int size){ int bytes_read = OwningStdIOByteSourceBase::read(buffer, size); for(int i=0; i<bytes; ++i) if(buffer[i] == '\r') buffer[i] = '\n'; return bytes_read; } };

Warning: This code snippet might be a problem if you try to combine UTF8 and classic Mac newlines. Edit: After thinking about it, I do not think that the combination of UTF8 and Max newlines is a problem with the code above.

Best Regards Ben Strasser

prasadsilva commented 6 years ago

Thanks for the reply. This is the first time I've ever encountered a file with classic Mac line endings. So I can't really say if this is prevalent.

BTW - You should add (PR) this project to the list here: https://github.com/nothings/single_file_libs.

ben-strasser commented 6 years ago

Thanks for the link. I opened an issue there.