Closed db213 closed 7 years ago
In your proposal:
If a double quote is part of a field, then it must be escaped with a back slash (\).
Does this include unquoted fields? In other words, should it be:
testing, "1, \"quoted\"", help "me"
or:
testing, "1, \"quoted\"", help \"me\"
Yes this included unquoted fields, so the second example is correct.
CSV does not have a formal definition, but it's important BE and ML agree to a specific, formal format.
My proposal is to use the CSV specification proposed by RFC4180 (yes I'm ripping this from Wikipedia) with a few amendments:
A CSV is plain text file using the character set UTF-8 that:
I thought this was reasonable as it's easy to parse with any CSV processing package (e.g. Pandas). If this is accepted, this specification should probably be added to the ML and BE specifications: ML should expect to deal with CSVs of this format, and BE should only pass CSVs of this format to ML . If any other format is passed to ML, it is valid behaviour to throw an error.