Closed GoogleCodeExporter closed 9 years ago
This is actually a dupe of bug #2.
The CSV is processed in two passes. First, to split by entry on unescaped
newline chars. Second, to parse the actual entry (and all the messiness
involved in escaping).
I wanted to make it work in a single pass and capable of pulling a single or a
range of entries at once for larger datasets. Unfortunately, modifying the
regex to make it work that way is beyond my ability.
I started work on a Finite State Machine that counts double-quotes and newline
chars to handle the splitting but haven't finished it yet. You can see the it
commented out in the source. I haven't really been focused to take another look
at that since I last worked on it because these other, more critical bugs
started popping up.
reValid shouldn't be an issue once it's working because the regex in
csvEntry2Array should only be applied to singular entries anyway. I could
probably do a re-write to exclude it. As long as it doesn't dramatically affect
performance I'd prefer to leave it in and add some better error logging when an
entry fails.
Original comment by evanpla...@gmail.com
on 5 Sep 2012 at 6:56
This issue was closed by revision 189e3ec492d7.
Original comment by evanpla...@gmail.com
on 9 Sep 2012 at 9:54
OK, it's done. I don't have any tests to back it up yet but the new line
splitting capability should be fully functional.
I have it disabled by default until it has been fully tested but it's not
difficult to enable.
Just add the parameter 'experimental:true' to either $.csv2Array() or
$.csv2Dictionary().
If you want to help, I need tests demonstrating all of the edge cases.
Basically,I need proof that the of variations of the different newline chars
won't break the parser when they're used in CSV values.
Original comment by evanpla...@gmail.com
on 9 Sep 2012 at 10:00
A test runner has been added. It doesn't pass even though the results look
flawless.
I could use a second set of eyes on the code. I have a feel I'm missing
something obvious. Also, the reValid regex has been disabled because it breaks
the new newlines-as-values feature.
Original comment by evanpla...@gmail.com
on 9 Sep 2012 at 10:50
I'll have a look when I get a chance.
BTW, comment #1 says this is a dup of bug #2. Oddly -- I don't SEE any bug #2.
I wonder why?
Original comment by r...@acm.org
on 9 Sep 2012 at 10:54
Sounds good...
Bug #2 was deleted. This one had a better description so I discarded the other.
Original comment by evanpla...@gmail.com
on 9 Sep 2012 at 11:07
Original comment by evanpla...@gmail.com
on 11 Oct 2012 at 4:07
This is done but still needs some adjustments to work with custom
newline/separator/escape chars.
It will remain 'experimental' until it can pass the complete test runner.
Expect it to become the default in 0.70.
Original comment by evanpla...@gmail.com
on 15 Oct 2012 at 10:42
The splitLines function is now fully functional and will be made the default in
release 0.65.
Thank you for your feedback.
Original comment by evanpla...@gmail.com
on 25 Oct 2012 at 2:40
Original issue reported on code.google.com by
r...@acm.org
on 4 Sep 2012 at 9:38