Open GoogleCodeExporter opened 9 years ago
Technically 'carriage return' line endings are no longer a valid form of line
ending on any platform. Since the release of OSX mac line endings have followed
the Unix convention (ie newline).
Excel for OSX is the only application I know of that still (incorrectly)
explicitly uses CR line endings.
To keep the parser as simple and efficient as possible, it silently ignores all
CR characters making both CRLF and LF characters read the same.
There are two approaches to fix this issue:
1. Run the CSV data through a pre-processor function that converts all CR/CRLF
characters to LF characters.
2. Change the parser code to add special cases for both CRLF and CR line
endings.
I strongly suggest the former. Hacking on the parser code is no simple matter.
The lexer is written to be as slim and efficient as possible, adding more edge
cases will slow down the parser for everything. You may see a lot of CSV with
CR characters but in the greater scope of things, CSV with CR line endings is
not a common occurrence.
If you'd like to contribute a patch, or need help working out a pre-processor
I'll try to help. Due to personal/work circumstances I don't have the time to
focus on further development right now.
Original comment by evanpla...@gmail.com
on 31 May 2013 at 4:40
I know you don't want to add this, but man I would totally donate beer money to
get it working. Having to tell people to open their files in Excel just to
save-as a different csv isn't a great user experience.
Is there any javascript out there that normalizes line endings in a csv?
Original comment by jpsi...@gmail.com
on 4 Sep 2013 at 2:28
Can we just use regular expressions or something to convert all CR to LF?
Original comment by Chad.R.B...@gmail.com
on 9 Sep 2013 at 10:33
it would be awesome if this was just built in as a failsafe, i run in to this
issue constantly
Original comment by jpsi...@gmail.com
on 10 Sep 2013 at 4:26
I was able to get the parser to work by treating the "^/r$" case as end of line
in all 3 states of the parser. I don't know how this will affect non Unix/Excel
exported case.
Original comment by dan.bo...@gmail.com
on 10 Sep 2013 at 8:18
I would agree with the comment that "it would be awesome if it was just built
in" Since part of the idea of jQuery is corss-platform functionality it seems
that it should be the library that handles it and not have to special case the
use of the library.
Original comment by dan.bo...@gmail.com
on 10 Sep 2013 at 8:21
[deleted comment]
Here is a copy of the parser as I mentioned in comment #5
Original comment by dan.bo...@gmail.com
on 10 Sep 2013 at 8:38
Attachments:
Dan, what exactly did you change in this? whatever it is fixes my issue
Original comment by jpsi...@gmail.com
on 10 Sep 2013 at 9:39
wait nevermind, i spoke too soon. this just switches it up, it works for the
mac one but breaks the normal working ones
Original comment by jpsi...@gmail.com
on 10 Sep 2013 at 9:41
I only have the Mac Excel files, If you can send me a working case test file.
Let me look at it.
Original comment by dan.bo...@gmail.com
on 10 Sep 2013 at 9:47
Attached is one that is totally messed up. It works fine if I open it in excel,
save-as, and change the format to "windows comma separated (.csv)". If I do
that there are no issues with the parser.
Original comment by jpsi...@gmail.com
on 10 Sep 2013 at 11:28
Attachments:
That "openflash.csv" is one that doesn't work with your fix. it only works if i
resave it in the other format
Here is one that works with your fix, and not with the normal jquery-csv:
http://datazap.me/sites/default/files/datalogs/admin/bad-datalog.csv
Original comment by jpsi...@gmail.com
on 10 Sep 2013 at 11:31
Ok I took a guess at what was happening and I think I figured it out.
I changed the regular expression to include |/r/n| as one of the options and
put if first so it matches so now all 3 cases are considered a newline with no
"phantoms" it appears to work on limited testing with your file and the mac
cases. I will play with it some more, but here is my latest version
Original comment by dan.bo...@gmail.com
on 10 Sep 2013 at 11:46
Attachments:
awesome!! the "bad-datalog.csv" file works with your latest fix. the crazy
"openflash" case is exceptionally messed up. That said, if excel can open and
resave it and have it work, there's gotta be something that can make it work
with this library
if you have paypal please let me send you a few $
Original comment by jpsi...@gmail.com
on 11 Sep 2013 at 12:08
You're welcome. I need this to work as much as you do, so don't worry about
any payment. Just glad I could help.
Original comment by dan.bo...@gmail.com
on 11 Sep 2013 at 12:38
Any ideas how to make that other file work? hopefully a quick fix like your
other fix?
Original comment by jpsi...@gmail.com
on 11 Sep 2013 at 1:12
Sorry I totally missed that file and then life got in the way.
I'll take a quick look in the morning
Original comment by dan.bo...@gmail.com
on 11 Sep 2013 at 3:56
ok, I need some information about the openflash.csv file. Appears to be
unicode or other character set, so I need the character set information, so I
can set my filereader up correctly.
It appears to be ";" separated, but I want to confirm that.
Last do you know the delimiter is because the data appears to have quotes in
the middle of column values which is what it is causing the problem, Not sure
what the delimiter should be.
Also There is a bunch of font related and I believe column formmating stuff
at the beginning of that file which I am not sure if it is causing a problem or
not, but it could be.
Note: My version of Excel won't even open this file.
Original comment by dan.bo...@gmail.com
on 11 Sep 2013 at 3:41
i wrote to the guy who makes the device that generates these logs. i'll keep
you posted
Original comment by jpsi...@gmail.com
on 11 Sep 2013 at 11:51
For folks looking for a code snippet that fixes the problem: Here's what I use
to make all the new line characters consistent. One line scrubs the whole input
before sending it to CSV parser....
// Normalize new lines
result = result.replace(/[\r|\r\n]/g, "\n");
// Parse the CSV to a 2D array
Selfservice.csvData = $.csv.toArrays(result);
Original comment by DJu...@gmail.com
on 1 Nov 2013 at 10:25
Issue 31 has been merged into this issue.
Original comment by evanpla...@gmail.com
on 9 Dec 2013 at 11:21
[deleted comment]
openflash.csv is Microsoft Compound File Binary Format (type=CFBF ext=.cfb)
The header 20 D0 CF 11 E0 A1 B1 1A really gives it away.
it is not CSV, or even a text file!
Original comment by crazy_l...@netspace.net.au
on 12 Apr 2014 at 1:06
Suggestion from post 21 - DJu...
that was amazingly helpful. Thank you!
Original comment by Gitelman...@gmail.com
on 29 Apr 2014 at 6:07
The #21 solution had a problem with large files for me.
The update in #14 appears to have fixed the issue for me.
Is this going to roll into a release?
Original comment by jerryga...@yahoo.com
on 22 May 2014 at 9:02
Fix from #14 worked for me, too. Makes me worried when the first csv file I try
to parse using this library didn't work. How maintained is this library?
Original comment by barl...@gmail.com
on 6 Jun 2014 at 4:47
If you clone the source repository the fix should already be included.
@mirlord has been maintaining a fork on GitHub in my absence. I recently moved
the upstream repo over to GitHub too. As soon as I finish some work on the test
runner I plan to push out another release.
Most/all of the remaining issues have been addressed. The only major feature
missing is the ability to process very large data sets.
Original comment by evanpla...@gmail.com
on 6 Jun 2014 at 4:56
Fix #14 and #21 together worked a charm! Thanks guys
Original comment by djmatt...@gmail.com
on 8 Jul 2014 at 10:54
@evanpla...@gmail.com: Could you please share the URL of the upstream repo on
GitHub?
@mirlord's GH doesn't work for me, says 'csv' is undefined. Thanks.
Original comment by jonatan....@gmail.com
on 10 Jul 2014 at 5:34
Original issue reported on code.google.com by
Chad.R.B...@gmail.com
on 30 May 2013 at 1:06Attachments: