fossteam / jquery-csv

Automatically exported from code.google.com/p/jquery-csv
MIT License
0 stars 0 forks source link

CSV values with embedded newlines do not work. #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Per the RFC (paragraph 2.6), this should be legal:

"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx

I think the way to fix this might be to check to see if reValue consumed the 
entire line (check reValue.lastIndex). If not, see if it ended before a 
(?:^|,)" (or maybe you have to use lookbehind). If so, splice the next line 
onto this one, and pick up where we left off.

But you'll have to get it past the reValid check. Is that really needed? I 
think a CSV line is not well-formed IFF it can't be parsed as a sequence of 
values. I don't see that the validator regex is accomplishing anything useful?

Original issue reported on code.google.com by r...@acm.org on 4 Sep 2012 at 9:38

GoogleCodeExporter commented 9 years ago
This is actually a dupe of bug #2.

The CSV is processed in two passes. First, to split by entry on unescaped 
newline chars. Second, to parse the actual entry (and all the messiness 
involved in escaping).

I wanted to make it work in a single pass and capable of pulling a single or a 
range of entries at once for larger datasets. Unfortunately, modifying the 
regex to make it work that way is beyond my ability.

I started work on a Finite State Machine that counts double-quotes and newline 
chars to handle the splitting but haven't finished it yet. You can see the it 
commented out in the source. I haven't really been focused to take another look 
at that since I last worked on it because these other, more critical bugs 
started popping up.

reValid shouldn't be an issue once it's working because the regex in 
csvEntry2Array should only be applied to singular entries anyway. I could 
probably do a re-write to exclude it. As long as it doesn't dramatically affect 
performance I'd prefer to leave it in and add some better error logging when an 
entry fails.

Original comment by evanpla...@gmail.com on 5 Sep 2012 at 6:56

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 189e3ec492d7.

Original comment by evanpla...@gmail.com on 9 Sep 2012 at 9:54

GoogleCodeExporter commented 9 years ago
OK, it's done. I don't have any tests to back it up yet but the new line 
splitting capability should be fully functional.

I have it disabled by default until it has been fully tested but it's not 
difficult to enable.

Just add the parameter 'experimental:true' to either $.csv2Array() or 
$.csv2Dictionary().

If you want to help, I need tests demonstrating all of the edge cases. 
Basically,I need proof that the of variations of the different newline chars 
won't break the parser when they're used in CSV values.

Original comment by evanpla...@gmail.com on 9 Sep 2012 at 10:00

GoogleCodeExporter commented 9 years ago
A test runner has been added. It doesn't pass even though the results look 
flawless.

I could use a second set of eyes on the code. I have a feel I'm missing 
something obvious. Also, the reValid regex has been disabled because it breaks 
the new newlines-as-values feature.

Original comment by evanpla...@gmail.com on 9 Sep 2012 at 10:50

GoogleCodeExporter commented 9 years ago
I'll have a look when I get a chance.

BTW, comment #1 says this is a dup of bug #2. Oddly -- I don't SEE any bug #2. 
I wonder why?

Original comment by r...@acm.org on 9 Sep 2012 at 10:54

GoogleCodeExporter commented 9 years ago
Sounds good...

Bug #2 was deleted. This one had a better description so I discarded the other.

Original comment by evanpla...@gmail.com on 9 Sep 2012 at 11:07

GoogleCodeExporter commented 9 years ago

Original comment by evanpla...@gmail.com on 11 Oct 2012 at 4:07

GoogleCodeExporter commented 9 years ago
This is done but still needs some adjustments to work with custom 
newline/separator/escape chars.

It will remain 'experimental' until it can pass the complete test runner. 
Expect it to become the default in 0.70.

Original comment by evanpla...@gmail.com on 15 Oct 2012 at 10:42

GoogleCodeExporter commented 9 years ago
The splitLines function is now fully functional and will be made the default in 
release 0.65.

Thank you for your feedback.

Original comment by evanpla...@gmail.com on 25 Oct 2012 at 2:40