Closed GoogleCodeExporter closed 8 years ago
Could you provide a data sample stub that causes the failure.
From a technical standpoint, even empty lines should contain delimiters to
represent the number of columns in the data.
To quote RFC 4180:
4. Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma. For example:
aaa,bbb,ccc
Only the last line is allowed to be completely empty in a valid CSV file and
the parser should handle it properly if one exists.
So an empty line with 3 columns would still show up as:
,,
Basically, a CSV file must have the same number of columns in every row to be
considered valid. Even if the trailing values on each row are consistently
empty.
There are two potential solutions to this problem...
First, you could pre-process the line data. A hook called onParseValue() exists
to allow users to insert an arbitrary function to do additional processing on
CSV values. To give users better fine-tuned control over the parser I was
planning on including another called onParseEntry() that does the same but on a
full line of data.
With that in place you will be able to insert a function that adds in the
correct number of delimiters (ex commas).
The second solution works the same as you have already done. Do a full pass on
the data first to remove empty lines. I'll be adding two hooks, onPreParse()
onPostParse() to do the same.
There are two schools of thought on this. Those who want to liberate the data
by making the parser more versatile and those who think the users should be
held responsible for providing 'clean' data.
To see a bad example of the too-liberal approach, just look at pre-HTML 4.01
browsers and 'quirks mode'. Too much variability in the data requires a lot of
'magic' (read 'hacks') to exist below the surface.
CSV already a very liberal format that allows a lot of funky edge cases that
need to be covered in the parser. My main overarching goal is to provide a
parser that specializes in CSV data and not necessarily anything beyond CSV.
There are a LOT of use cases where users want to have the ability to extend the
parser to do more interesting things. I don't want to disregard that request
altogether so my approach is to provide hooks. By doing so users have the
ability insert custom code inline during the different phases of the parsing
process and add their own domain-specific magic where it's needed.
Let me know what you think...
Original comment by evanpla...@gmail.com
on 10 Oct 2012 at 10:06
You make some good points. I agree that you should stick to the standard, and
using anything that isn't standard should be the responsibility of the person
using your library. Providing those hooks adds additional convenience for them.
I didn't know what the definition of the standard was.
However, here is the issue, Microsoft. The way I make the CSV files in the
first place is to use MS Excel, and use save as CSV (Windows/DOS format). I am
expecting that the majority of my users would do it this way too. At least with
MS Excel 2013 (beta), it automatically puts an empty line, no commas, at the
end of the file. Actually, I just checked, it does that in Excel 2010 too. So
Microsoft is doing something wrong, typical. If you open this file with a
decent text editor, like Notepad++, you will see the blank line.
I didn't realize that what Microsoft is doing was wrong until just now. I can't
expect my users to manually trim that line themselves, so I will have to put
code to do it myself. I didn't have this problem with version 0.62, so you did
something that made this more strict.
If you are feeling generous, what code would you use to take the last line off
if it was empty? I am primarily a MATLAB coder, and still a novice at
JavaScript.
Original comment by Chad.R.B...@gmail.com
on 10 Oct 2012 at 11:32
Attachments:
Original comment by evanpla...@gmail.com
on 11 Oct 2012 at 4:08
Original comment by evanpla...@gmail.com
on 11 Oct 2012 at 4:08
An empty line at the end of the file is legal, that's the one exception to the
rule.
2. The last record in the file may or may not have an ending line
break. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
Sorry about the confusion...
The RFC can be found here:
http://tools.ietf.org/html/rfc4180
If it's still failing, it may be a legitimate bug. I haven't created any tests
to validate the functionality of toObjects() so I probably missed a bug when I
wrote it.
Give me a little time to update the tests with full toObjects() coverage. As
soon as I push it, I'll let you know where to go to run the test runner.
If the last line is the only empty line, I guarantee that you won't need to do
any additional pre-processing.
If you want to get rid of empty lines in the middle of the file take a look at:
http://code.google.com/p/jquery-csv/wiki/Hooks?ts=1349929348&updated=Hooks
The hooks aren't done being implemented yet but once they are you should be
able to strip empty lines using the example found in the onPreParse() usage
demo.
Original comment by evanpla...@gmail.com
on 11 Oct 2012 at 4:25
I just tested this and the final blank line gets computed to undefined:
eData[0] is =>Del,Trotter,Central Sales Supervisor,Xways,Central
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[1] is =>Del,Trotter,Central Sales Supervisor,Xways,Central
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[2] is =>Del,Trotter,Central Sales Supervisor,Xways,Central
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[3] is =>Del,Trotter,Central Sales Supervisor,Xways,Central
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[4] is =>Del,Trotter,Central Sales Supervisor,Xways,Central
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[5] is =>undefined
Original comment by danielti...@gmail.com
on 12 Oct 2012 at 1:57
BTW the above was using eData = $.csv.toArrays(data); so its not just toObjects
that has the issue
Original comment by danielti...@gmail.com
on 13 Oct 2012 at 9:40
This issue was closed by revision 5f9f33616ce2.
Original comment by evanpla...@gmail.com
on 15 Oct 2012 at 10:32
The fix done and will be incorporated into the 0.65 release
Two new tests were added to the test runner under 'Edge Cases' to check for the
problem and the bug was fixed in jquery.csv.js.
If you want to check it, feel free to run the test runner again:
http://jquery-csv.googlecode.com/git/test/test.html
You can also try it out with your own dataset at the new 'Basic Usage Demo':
http://jquery-csv.googlecode.com/git/examples/basic-usage.html
Thank you for your help in getting this fixed. Hopefully, that will rid you of
the need for any obnoxious hacks/workarounds in the future.
Original comment by evanpla...@gmail.com
on 15 Oct 2012 at 10:39
Yea I like this, I don't need to worry about the blank line, or use a length -
1 hack. I also like where in your test you say reticulating splines. I guess
you are a The Sims fan.
Original comment by Chad.R.B...@gmail.com
on 16 Oct 2012 at 3:54
Yeah, the hack you described is basically how I handle it, except the check is
done inline.
Actually, Sim City 2000 (I'm old school). I'm glad you liked it. I created the
demo to help lower the barrier for new users but it doubles for testing
datasets too.
It's still very primitive but if take a look at the file-handling demo, you can
open csv files directly from the desktop (using the HTML 5 File API) too.
Here's the link if you'd like to try it out:
http://jquery-csv.googlecode.com/git/examples/file-handling.html
That should be all for this bug. Unless you have more to add, I'm going to
close it out.
Original comment by evanpla...@gmail.com
on 16 Oct 2012 at 6:19
I tested using my CSV in your file-handling.html and it was handled perfectly,
I also saved off the javascript file and used it in my above test and that also
was handled perfectly, so I agree problem fixed
Original comment by danielti...@gmail.com
on 16 Oct 2012 at 8:49
Original issue reported on code.google.com by
Chad.R.B...@gmail.com
on 10 Oct 2012 at 8:42