davidsantiago / clojure-csv

A library for reading and writing CSV files from Clojure
187 stars 35 forks source link

initial insignificant white space #16

Closed rplevy-draker closed 12 years ago

rplevy-draker commented 12 years ago

If there is a line of data with whitespace preceding a quoted field, e.g.

  "2009-05-15 17:45:00","foo","bar"

This misparses the field and throws an exception:

java.lang.IllegalArgumentException: Invalid format: " "2009-05-15 17:45:00""

(strict parsing is disabled).

davidsantiago commented 12 years ago

Your file is formatted wrong, this is working as intended. See RFC 4180, points 4 and 5. Specifically, "spaces are considered part of a field and should not be ignored" and "Each field may or may not be enclosed in double quotes ... If fields are not enclosed with double quotes, then double quotes may not appear inside the fields."

That is all to say, initial whitespace is not ignored in CSV, it is part of a field. And if a field is quoted, the quotes must be the first and last characters in the field, including the whitespace.

rplevy-draker commented 12 years ago

Yeah I thought you might pull RFC on me. I guess I will need to preprocess the byte stream. On Aug 22, 2012 5:40 PM, "David Santiago" notifications@github.com wrote:

Your file is formatted wrong, this is working as intended. See RFC 4180http://tools.ietf.org/html/rfc4180, points 4 and 5. Specifically, "spaces are considered part of a field and should not be ignored" and "Each field may or may not be enclosed in double quotes ... If fields are not enclosed with double quotes, then double quotes may not appear inside the fields."

That is all to say, initial whitespace is not ignored in CSV, it is part of a field. And if a field is quoted, the quotes must be the first and last characters in the field, including the whitespace.

— Reply to this email directly or view it on GitHubhttps://github.com/davidsantiago/clojure-csv/issues/16#issuecomment-7953002.

davidsantiago commented 12 years ago

Yeah, sorry. I do try to err on the side of permissiveness, but it has to be weighed against factors like correctness, speed and complexity of the parsing algorithm caused by it.