Freaks over over extra blank lines

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Pass a CSV file with one or more blank lines into $.csv.toObjects(file)

What is the expected output? What do you see instead?

I expect it to just ignore the blank lines, or do something with them. At least 
a blank line at the end. I would expect that an extra blank line at the end of 
CSV files is a common occurrence. 

Specifically, it errors here, at line 365.

for(var j in headers) {
          object[headers[j]] = entry[j];
        }

Uncaught TypeError: Cannot read property '0' of undefined

This is because in the case a blank line, entry is undefined. 

What version of the product are you using? On what operating system?

0.64 on Windows. 

Please provide any additional information below.

It's up to you to decide what your program should do in this case. I figured it 
out and I can remove the blank lines from my input, but if one of my users puts 
a blank line in their input, it will just silently fail. I would prefer that to 
not happen. I guess alternatively, I could check the input before putting it 
through your function and removing the blank line myself.

Original issue reported on code.google.com by Chad.R.B...@gmail.com on 10 Oct 2012 at 8:42

GoogleCodeExporter commented 9 years ago

Could you provide a data sample stub that causes the failure.

From a technical standpoint, even empty lines should contain delimiters to 
represent the number of columns in the data.

To quote RFC 4180:

  4.  Within the header and each record, there may be one or more
      fields, separated by commas.  Each line should contain the same
      number of fields throughout the file.  Spaces are considered part
      of a field and should not be ignored.  The last field in the
      record must not be followed by a comma.  For example:

      aaa,bbb,ccc

Only the last line is allowed to be completely empty in a valid CSV file and 
the parser should handle it properly if one exists.

So an empty line with 3 columns would still show up as:

    ,,

Basically, a CSV file must have the same number of columns in every row to be 
considered valid. Even if the trailing values on each row are consistently 
empty.

There are two potential solutions to this problem...

First, you could pre-process the line data. A hook called onParseValue() exists 
to allow users to insert an arbitrary function to do additional processing on 
CSV values. To give users better fine-tuned control over the parser I was 
planning on including another called onParseEntry() that does the same but on a 
full line of data.

With that in place you will be able to insert a function that adds in the 
correct number of delimiters (ex commas).

The second solution works the same as you have already done. Do a full pass on 
the data first to remove empty lines. I'll be adding two hooks, onPreParse() 
onPostParse() to do the same.

There are two schools of thought on this. Those who want to liberate the data 
by making the parser more versatile and those who think the users should be 
held responsible for providing 'clean' data.

To see a bad example of the too-liberal approach, just look at pre-HTML 4.01 
browsers and 'quirks mode'. Too much variability in the data requires a lot of 
'magic' (read 'hacks') to exist below the surface.

CSV already a very liberal format that allows a lot of funky edge cases that 
need to be covered in the parser. My main overarching goal is to provide a 
parser that specializes in CSV data and not necessarily anything beyond CSV.

There are a LOT of use cases where users want to have the ability to extend the 
parser to do more interesting things. I don't want to disregard that request 
altogether so my approach is to provide hooks. By doing so users have the 
ability insert custom code inline during the different phases of the parsing 
process and add their own domain-specific magic where it's needed.

Let me know what you think...

Original comment by evanpla...@gmail.com on 10 Oct 2012 at 10:06

GoogleCodeExporter commented 9 years ago

You make some good points. I agree that you should stick to the standard, and 
using anything that isn't standard should be the responsibility of the person 
using your library. Providing those hooks adds additional convenience for them. 
I didn't know what the definition of the standard was. 

However, here is the issue, Microsoft. The way I make the CSV files in the 
first place is to use MS Excel, and use save as CSV (Windows/DOS format). I am 
expecting that the majority of my users would do it this way too. At least with 
MS Excel 2013 (beta), it automatically puts an empty line, no commas, at the 
end of the file. Actually, I just checked, it does that in Excel 2010 too. So 
Microsoft is doing something wrong, typical. If you open this file with a 
decent text editor, like Notepad++, you will see the blank line. 

I didn't realize that what Microsoft is doing was wrong until just now. I can't 
expect my users to manually trim that line themselves, so I will have to put 
code to do it myself. I didn't have this problem with version 0.62, so you did 
something that made this more strict. 

If you are feeling generous, what code would you use to take the last line off 
if it was empty? I am primarily a MATLAB coder, and still a novice at 
JavaScript.

Original comment by Chad.R.B...@gmail.com on 10 Oct 2012 at 11:32

Attachments:

test.csv

GoogleCodeExporter commented 9 years ago

Original comment by evanpla...@gmail.com on 11 Oct 2012 at 4:08

GoogleCodeExporter commented 9 years ago

Original comment by evanpla...@gmail.com on 11 Oct 2012 at 4:08

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

An empty line at the end of the file is legal, that's the one exception to the 
rule.

2.  The last record in the file may or may not have an ending line
    break.  For example:

    aaa,bbb,ccc CRLF
    zzz,yyy,xxx

Sorry about the confusion...

The RFC can be found here:
http://tools.ietf.org/html/rfc4180

If it's still failing, it may be a legitimate bug. I haven't created any tests 
to validate the functionality of toObjects() so I probably missed a bug when I 
wrote it.

Give me a little time to update the tests with full toObjects() coverage. As 
soon as I push it, I'll let you know where to go to run the test runner.

If the last line is the only empty line, I guarantee that you won't need to do 
any additional pre-processing.

If you want to get rid of empty lines in the middle of the file take a look at:
http://code.google.com/p/jquery-csv/wiki/Hooks?ts=1349929348&updated=Hooks

The hooks aren't done being implemented yet but once they are you should be 
able to strip empty lines using the example found in the onPreParse() usage 
demo.

Original comment by evanpla...@gmail.com on 11 Oct 2012 at 4:25

GoogleCodeExporter commented 9 years ago

I just tested this and the final blank line gets computed to undefined:

eData[0] is =>Del,Trotter,Central Sales Supervisor,Xways,Central 
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del 
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[1] is =>Del,Trotter,Central Sales Supervisor,Xways,Central 
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del 
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[2] is =>Del,Trotter,Central Sales Supervisor,Xways,Central 
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del 
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[3] is =>Del,Trotter,Central Sales Supervisor,Xways,Central 
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del 
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[4] is =>Del,Trotter,Central Sales Supervisor,Xways,Central 
Sales,Crossways,,EX,,,,,01234 123 456,,2040,6040,,01234 123 321,,Del 
Trotter,EX,/o=TITUK/ou=TIT/cn=RECIPIENTS/cn=DEL TROTTER,Del.Trotter,,,
eData[5] is =>undefined

Original comment by danielti...@gmail.com on 12 Oct 2012 at 1:57

GoogleCodeExporter commented 9 years ago

BTW the above was using eData = $.csv.toArrays(data); so its not just toObjects 
that has the issue

Original comment by danielti...@gmail.com on 13 Oct 2012 at 9:40

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 5f9f33616ce2.

Original comment by evanpla...@gmail.com on 15 Oct 2012 at 10:32

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

The fix done and will be incorporated into the 0.65 release

Two new tests were added to the test runner under 'Edge Cases' to check for the 
problem and the bug was fixed in jquery.csv.js.

If you want to check it, feel free to run the test runner again:
http://jquery-csv.googlecode.com/git/test/test.html

You can also try it out with your own dataset at the new 'Basic Usage Demo':
http://jquery-csv.googlecode.com/git/examples/basic-usage.html

Thank you for your help in getting this fixed. Hopefully, that will rid you of 
the need for any obnoxious hacks/workarounds in the future.

Original comment by evanpla...@gmail.com on 15 Oct 2012 at 10:39

GoogleCodeExporter commented 9 years ago

Yea I like this, I don't need to worry about the blank line, or use a length - 
1 hack. I also like where in your test you say reticulating splines. I guess 
you are a The Sims fan.

Original comment by Chad.R.B...@gmail.com on 16 Oct 2012 at 3:54

GoogleCodeExporter commented 9 years ago

Yeah, the hack you described is basically how I handle it, except the check is 
done inline.

Actually, Sim City 2000 (I'm old school). I'm glad you liked it. I created the 
demo to help lower the barrier for new users but it doubles for testing 
datasets too.

It's still very primitive but if take a look at the file-handling demo, you can 
open csv files directly from the desktop (using the HTML 5 File API) too.

Here's the link if you'd like to try it out:
http://jquery-csv.googlecode.com/git/examples/file-handling.html

That should be all for this bug. Unless you have more to add, I'm going to 
close it out.

Original comment by evanpla...@gmail.com on 16 Oct 2012 at 6:19

Changed state: Verified

GoogleCodeExporter commented 9 years ago

I tested using my CSV in your file-handling.html and it was handled perfectly, 
I also saved off the javascript file and used it in my above test and that also 
was handled perfectly, so I agree problem fixed

Original comment by danielti...@gmail.com on 16 Oct 2012 at 8:49

giigii91 / jquery-csv

Freaks over over extra blank lines #13