beenex / jquery-csv

Automatically exported from code.google.com/p/jquery-csv
MIT License
0 stars 0 forks source link

Parsing in IE7 Extremely Slow #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Parsing a CSV file in IE7 is roughly 10x slower than in any other browser 
(including IE8)

Original issue reported on code.google.com by ee...@eecue.com on 9 Aug 2012 at 6:28

GoogleCodeExporter commented 9 years ago
@eecue1 There's nothing I can do to improve Microsoft's RegEx parser. The 
mainline regex is slated to change in the next release so that may provide some 
improvement. Also, I'm going to experiment with using a state machine for 
parsing in future releases.

If you want to help, the project could use some performance tests. I'm not sure 
what you're currently using but if you have any good code that could be adapted 
to tests, I'm always open for contributions.

Original comment by evanpla...@gmail.com on 16 Aug 2012 at 3:50

GoogleCodeExporter commented 9 years ago
Getting rid of the extraneous  construction of intermediate RegExp's (as I 
suggest in my fix for Issue #5) and getting rid of reValid entirely (as I 
suggest in Issue #7) might possibly make a difference.

Of course, #7 involves another regex test, but it's on the end of the string, 
and a much simpler regex.

The Regex in reValid requires 101 steps to match the test data line, which 
isn't horrible; there doesn't seem to be any combinatorial backtracking going 
on. (I use RegexBuddy to analyze regexes -- I expect you'll find it very useful.

http://www.regexbuddy.com -- well worth the $40 if you deal with complex 
regexes a lot.

Also, ANTLR 3 supports Javascript as a target. I've not seen what sort of 
Javascript it produces...

Original comment by r...@acm.org on 4 Sep 2012 at 10:09

GoogleCodeExporter commented 9 years ago
@rwk@acm.org Yeah well what is it, 6 or 8 regex constructions per entry that 
gets parsed. We're definitely talking about sloppy O(n) performance on the 
regex construction alone. I'm well aware of the issue.

I think, if the line-splitter function (ex csv2Array) were to pass a closure 
into the entry-parser function (ex csvEntry2Array) then all you'd need to do is 
check the state of the closure and used the enclosed regex objects if they're 
available.

Since the regexes can be compiled on the first pass alone, that should change 
the regex construction to O(1) complexity.

Of course, that's all theoretical. It should work but I rarely play with 
closures so it'll probably take some fiddling before I can get it to work.

Original comment by evanpla...@gmail.com on 5 Sep 2012 at 7:06

GoogleCodeExporter commented 9 years ago
The reValid regex has been disabled. It breaks on the newlines-as-value edge 
case and isn't really necessary now that the project has some good test 
coverage.

Maybe that will give a slight boost in performance. Next up, I'm going to work 
on minimizing the number of regex object constructions.

Original comment by evanpla...@gmail.com on 9 Sep 2012 at 10:53

GoogleCodeExporter commented 9 years ago
OK, the last performance fix is in.

The regex object constructions have been reduced from O(n) to O(1) complexity. 
Basically, instead of constructing all new regex objects every time the parser 
is called, they are only constructed on the first pass and passed back up the 
chain for re-use via a closure.

For example, on a call to $.csv.toArrays() the new arrangement will only 
require 3 object constructions no matter how large the input dataset is. 
Whereas, the old method adds 3 new constructions for every entry in the CSV 
dataset.

In the tests alone (that use minimal datasets) the number of constructions is 
reduced from 91 to 21.

Chrome does a lot to optimize away the difference but IE's javascript engine 
isn't nearly as optimized so it'll probably the new update will probably have a 
greater impact there.

Try it out and let me know if the performance has improved drastically. 
Otherwise, I'm going to assume that this is fixed and close it.

Original comment by evanpla...@gmail.com on 7 Oct 2012 at 2:54

GoogleCodeExporter commented 9 years ago

Original comment by evanpla...@gmail.com on 7 Oct 2012 at 2:54

GoogleCodeExporter commented 9 years ago

Original comment by evanpla...@gmail.com on 11 Oct 2012 at 4:07

GoogleCodeExporter commented 9 years ago

Original comment by evanpla...@gmail.com on 15 Oct 2012 at 10:39