Closed jmacias closed 10 years ago
This is not an error. The behavior of BufferedReader.readLine() is not a useful guide to the CSV format. RFC 4180, which is about as close as things get to a standard, specifies CRLF line terminators in CSV files. Clojure-CSV accepts CRLF or just LF, which is also a common line terminator. If you need some other weird character as your end-of-line, then set the :end-of-line option to that character when you parse.
Thanks @davidsantiago for the clarification.
Hope you can help me with another question. I have the use case where I need to process CSV files from different sources, I've found that sometimes when files comes from a Mac running MS Excel (OS 9 or Mac OS X running MS Excel 2011) they use '\r' as return line.
I was thinking on using clojure line-seq clojure.java.io/reader
and then just parse each line with clojure clojure-csv.core/parse-csv
. What would be your suggestion to handle this files?
Thanks in advance David!
You can't parse a csv file line by line, as csv fields can contain line separators. You need to fully parse the cvs to even know which are line breaks and which are in the data. You should use parse-csv with the :end-of-line option set to "\r" for those files.
David
On Sunday, May 11, 2014, Juan Macias notifications@github.com wrote:
Thanks @davidsantiago https://github.com/davidsantiago for the clarification.
Hope you can help me with another question. I have the use case where I need to process CSV files from different sources, I've found that sometimes when files comes from a Mac running MS Excel (OS 9 or Mac OS X running MS Excel 2011) they use '\r' as return line.
I was thinking on using clojure line-seq clojure.java.io/reader and then just parse each line with clojure clojure-csv.core/parse-csv . What would be your suggestion to handle this files?
Thanks in advance David!
— Reply to this email directly or view it on GitHubhttps://github.com/davidsantiago/clojure-csv/issues/23#issuecomment-42785736 .
I found a possible issue with following
parse-csv
function, the line-seq behavior is the same behavior of java, is there a reason why the parse-csv does not behave the same way?Does not follow the same behavior than:
The reason is that
line-seq
uses a BufferedReader to readLine where a line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed. And parse-csv only consider \n and \r as a new line.http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#readLine()