Closed GoogleCodeExporter closed 9 years ago
@thomascarlsen86
Thank you for filing the bug. From what I know, the bug is caused by Firefox
not treating the regex global flag the same as the rest of the browsers do.
More info can be found on the issue in the following link:
http://stackoverflow.com/q/4178970/290340
As soon as I come up with a fix I'll incorporate it into the next release.
Original comment by evanpla...@gmail.com
on 2 Jul 2012 at 12:49
[deleted comment]
thanks for the quick reply, and sad to hear that this is't a quick fix :(
Original comment by thomasca...@gmail.com
on 2 Jul 2012 at 12:55
Man, I totally made this code work last night perfectly and beautifully in
Chrome, and then this morning opened it up in Firefox and ran into exactly this
issue. Noooooo!
http://blog.thatscaptaintoyou.com/strange-behavior-of-the-global-regex-flag/
might have help you. It does say "The solution to this dilemma is painfully
simple, if the regex is global just set the “lastIndex” property to zero
before you test.". I would do it myself, but I haven't dug into your code deep
enough to know where to fiddle.
Original comment by kou...@gmail.com
on 4 Jul 2012 at 8:30
The problem is showing up for me with this CSV file. It seems that adding
quotes (as in the first two entries there) makes firefox behave. Strange that
the other browsers work fine even on the non-quoted ones, though meta.delimiter
is defaulting to """.
Original comment by kou...@gmail.com
on 4 Jul 2012 at 8:42
Attachments:
@kousue The code is actually very simple, csv2Array and csv2Dictionary handle
the split by line then each line is passed to csvEntry2Array which handle the
more complex CSV parsing. csvEntry2Array is where you'll find the regexes that
need to be modified.
If you want to submit a patch, I'll get it incorporated ASAP. Otherwise, life
has me busy and it'll be a couple of weeks before I can get a fix working,
tested, and released.
If you decide to clone the source you'll also find a working test runner with
directions. I'll drop a minor release (0.61) now that has the latest.
As for the delimiter defaulting to a double quote, that's should be correct
according to the IETF RFC for CSV files. If you use something else (ex
backslash) it can be changed in the function call parameters.
I'm not sure why quoting all values works in Firefox, maybe that'll work as a
temporary solution until the bug is resolved.
Original comment by evanpla...@gmail.com
on 4 Jul 2012 at 10:06
BTW, the parameter to change the escape char is called 'escaper' and will be
available in release 0.61.
Original comment by evanpla...@gmail.com
on 4 Jul 2012 at 10:58
Original comment by evanpla...@gmail.com
on 4 Jul 2012 at 10:58
I spent some time messing with the code. Everywhere where a global flag could
be a problem has been resolved. I couldn't make it more global flag proof if I
tried.
After staring cross-eyed at the test results for a while, it appears that only
the fields that aren't delimited don't get matched (probably could have saved
some time by reading the comments more closely). The even stranger part is, the
data IS passing the validation step (else it would return null for non-matches)
it's just the str.replace matching that is broken.
I'm going to try implementing the matching using the more robust RegExp.exec
method next. Until this issue is fixed, you'll need to have your CSV data
delimited for the parser to work in Firefox.
If you want to help, clone the source and take a look at the code or run the
test runner in some other browsers (ie, Safari, IE, Opera, etc) and post your
results.
Original comment by evanpla...@gmail.com
on 5 Jul 2012 at 1:28
[deleted comment]
I have done this for a quick hack:
$.csvEntry2Array = function(csv, meta) {
// csv hack for FF
csv = '"'+csv.replace(/,/g,'","')+'"';
.......
Original comment by thomasca...@gmail.com
on 9 Jul 2012 at 12:15
Any chance this will work with mixed delimited/non-delimited values in FF?
Unfortunately I don't have the luxury of changing the source CSV I'm working
with (it gets automatically updated).
Original comment by ee...@eecue.com
on 20 Jul 2012 at 6:26
My quick hack will work with a mixed csv, but I still don't think its a pretty
solution, but it will work out for you ;)
Original comment by thomasca...@gmail.com
on 20 Jul 2012 at 6:29
Actually nope, it doesn't work because the delimited fields have commas inside
of them:
id, "delimited, field, here", other field, another field, "one more delimited
field", etc
Original comment by ee...@eecue.com
on 20 Jul 2012 at 6:34
okay, this ain't prettier but you can do this:
$.csvEntry2Array = function(csv, meta) {
// csv hack for FF
csv = "'"+csv.replace(/"/g,"'").replace(/,/g,"','")+"'";
and then set your "delimiter" to delimiter:"'", at the top og this file
Original comment by thomasca...@gmail.com
on 20 Jul 2012 at 6:41
That doesn't work, probably related to the random ' characters in the CSV.
Original comment by ee...@eecue.com
on 20 Jul 2012 at 6:46
I have to say that you have a realy f**ked up csv file m8. :)
Original comment by thomasca...@gmail.com
on 20 Jul 2012 at 6:49
Any progress on this? I tried the above suggestions, but I don't think I did it
right. They didn't help. I don't have, right now, a funky csv file.
Original comment by Chad.R.B...@gmail.com
on 13 Aug 2012 at 9:50
@Chad.R.Bernier None yet, my personal life has me occupied. I'll try to take
another shot at it over the next few days.
Original comment by evanpla...@gmail.com
on 16 Aug 2012 at 3:45
[deleted comment]
In the callback function for String.replace, it appears that whenever a match
is not found for a parenthesized submatch string, Firefox (and IE <= 8)
represents the result as an empty string, whereas Chrome (and IE 9+) uses
undefined. Run this quick test in your browser's console to confirm this
behavior (the regular expression rex uses the relevant part of reValue,
assuming a separator and delimiter of "):
var rex = /"([^""]*(?:""[^""]*)*)"|([^,""\s]*(?:\s+[^,""\s]+)*)/,
test1 = '"Hello!"',
test2 = 'Hello!';
function replaceCallback(m0, m1, m2) {
console.log('matched substring (m0): ' + m0 + '\n'
+ 'submatch 1 (m1): \n'
+ '\t type: ' + (typeof m1) + '\n'
+ '\t as string: ' + m1 + '\n'
+ 'submatch 2 (m2): \n'
+ '\t type: ' + (typeof m2) + '\n'
+ '\t as string: ' + m2 + '\n');
return '';
}
test1.replace(rex, replaceCallback); // m2 is empty string in FF, undefined in Chrome
test2.replace(rex, replaceCallback); // m1 is empty string in FF, undefined in Chrome
In jQuery-CSV, this difference in behavior causes undesirable results in
Firefox, due to the conditions on lines 88 and 93 of version 0.61 which expect
a value of "undefined" for submatches that are not found. Here is revision to
those two conditions that accounts for both undefined and empty string
possibilities:
csv.replace(reValue, function(m0, m1, m2) {
// Remove backslash from any delimiters in the value
if(typeof m1 === 'string' && m1.length) { // Fix: evaluates to false for both empty strings and undefined
var reDelimiterUnescape = /ED/g;
reDelimiterUnescape = RegExp(reDelimiterUnescape.source.replace(/E/, escaper), 'g');
reDelimiterUnescape = RegExp(reDelimiterUnescape.source.replace(/D/, delimiter), 'g');
output.push(m1.replace(reDelimiterUnescape, delimiter));
} else if(typeof m2 === 'string' && m2.length) { // Fix: evaluates to false for both empty strings and undefined
output.push(m2);
}
return '';
});
With these changes in place, the plugin is working perfectly for my needs on
all tested browsers :D. I hope this addresses the Firefox issues that some of
you encountered.
Original comment by skarrm...@gmail.com
on 27 Aug 2012 at 10:01
Comment #21 is probably a good lead to understanding the problem, but it breaks
Chrome, as shown by the test suite.
In particular, empty strings are perfectly legitimate values, and need to be
handled.
Original comment by r...@acm.org
on 4 Sep 2012 at 2:49
Ah. It passes the test if you accept #21's first change, and not his second.
The problem boils down to -- if the input really IS the null string value, you
don't want to reject it from both clauses.
But you also want to add a:
if (csv === "") {
return [""];
}
before the validity check that returns null, as an empty string is logically
one empty value (just as a comma is logically two empty values). This appears
to be consistent with the ABNF in the RFC.
You can also get rid of 5 constructions of RegExp's along the way -- you don't
need to construct each intermediate partially-substituted RegExp. Perhaps you
did it this way to make bugs show up better? I don't know if constructing a
RegExp is expensive enough to worry about or not, in the grand scheme of
things. It does work to remove them.
I constructed a bunch more test cases, with finer granularity, using jasmine,
when I imported this into my system. Those were helpful.
The git repository does not appear to be up-to-date!
But I'm attaching patches from what was distributed as jquery.csv-0.61.js
0001 fixes some lint
0002 fixes the bugs
0003 removes the extraneous RegExps.
The patches are independent; you can apply any subset in any order.
Also, I notice that the existing test cases don't cover embedded quotes in
strings; that's a case that ought to be covered. I'll add it to my suite, and
report it as a separate issue if it fails.
Original comment by r...@acm.org
on 4 Sep 2012 at 9:19
Attachments:
For your patching convenience -- I reconstructed in my clone of your public
repository, what should be the missing push, so I could construct patches you
should be able to apply directly.
We'll see what kind of chaos I git when I eventually pull the missing commit. :)
I also added an additional patch. This one removes the Unicode Byte-Order-Mark
(BOM) from the start of the file. This confuses some tools when jsmin goes and
sticks it in the middle of the file, which is deprecated in Unicode 3.2 (and
completely defeats any purpose it might have had!).
I made it separate in case you really have a need for a BOM -- or if your tools
will just stick it back anyway.
You might need to supply --ignore-space-change to 'git am' to apply these, but
they should apply OK on top of your 0.61 release point.
I'm removing the patches from my previous comment to avoid confusion. Note that
the numbering has changed:
0001 removes BOM
0002 fixes some lint
0003 fixes the bugs
0004 removes the extraneous RegExps.
Original comment by r...@acm.org
on 4 Sep 2012 at 8:59
Attachments:
@rwk@acm.org Thank you for your contribution. You're right that I didn't push
the last changes. I had some issues on my dev VM when Debian pulled in Gnome 3.
The release was stuck on 0.60 even though the src said 0.51. Everything is
fixed now.
Nice catch on the BOM issue. I currently don't run tests on the minified src,
I'll add that to the TODO list. Looks like I need a need to get Notepad++
running under WINE too. I really wish somebody would fork and port NPP to *nix.
Not too clear about 0004. You use 3 different types of replace on the regex
source. Is there a performance reason for doing it that way.
I think comment #14 might still need to be addressed. We'll see.
I'll post an update as soon as I have the changes pushed to the remote origin.
Original comment by evanpla...@gmail.com
on 5 Sep 2012 at 6:34
@rwk@acm.org I'm going to hold off on 0004 for now. First, because I want to
get this bugfix texted and verified first. Second, because I had something else
in mind to address the performance issues. See bug 6 for more details.
Original comment by evanpla...@gmail.com
on 5 Sep 2012 at 5:23
Thank you to everybody who participated in this bugfix.
If you have the time, please run your data through the parser to verify that
there aren't any other issues.
If nobody posts any other problems, I'd like to mark this bug as 'verified' and
move on.
Original comment by evanpla...@gmail.com
on 5 Sep 2012 at 5:52
[I didn't hit Submit on this, so this reply is appearing out-of-sequence. Since
you're heading in another direction anyway, it's just FYI...]
Re: 0004 -- no, just one kind of replace. There are just three contexts.
In the first case, you're pulling it from a supplied RegExp:
reValue = reValue.source.replace(/X/g, separator);
In the second case, reValue is now a string, so there's no .source. We just
replace.
reValue = reValue.replace(/Y/g, delimiter);
In the third case, we again have a string, but we want a RegExp, so we re-wrap
our final string in a RegExp again:
reValue = RegExp(reValue.replace(/Z/g, escaper), 'g');
Perhaps it would be more clear as:
var reValueStr = reValue.source;
reValueStr = reValueStr.replace(/X/g, separator);
reValueStr = reValueStr.replace(/Y/g, delimiter);
reValueStr = reValueStr.replace(/Z/g, escaper);
reValue = RegExp(reValueStr, 'g');
BTW, I tested with fields with commas; they work fine with these changes.
Original comment by r...@acm.org
on 5 Sep 2012 at 11:35
Gotcha, I prefer the second form.
I'll add that last change, run the tests, and if everything's good, drop 0.62
tonight.
I want to get this bug fix released asap before anybody else downloads the 0.61
source.
Nice work.
Original comment by evanpla...@gmail.com
on 6 Sep 2012 at 12:31
Original comment by evanpla...@gmail.com
on 11 Oct 2012 at 4:07
Original comment by evanpla...@gmail.com
on 15 Oct 2012 at 10:39
Original issue reported on code.google.com by
thomasca...@gmail.com
on 2 Jul 2012 at 12:20