Open cyberphone opened 8 years ago
Node.js uses V8 by default so I'm not surprised the behavior is the same. I wonder if the behavior still matches V8 when using Node with ChakraCore?
@cyberphone Is it possible to add javascript snippet which fails with console binary such as ch.exe for this?
I believe this is a bug in precision in parseFloat on edge cases.
@abchatra I think the sequence below shows that it is the output formatting of the very last (unprecise) digit that is the problem:
var v1 = 33333333.3333333
console.log(v1);
var v2 = 0.000000036;
console.log(v2);
console.log(v1 + v2);
Chrome, Firefox, Safari: 33333333.333333336 IE11: 33333333.333333335
@dilijev I would assume that node.js with Chakra would exhibit the same behavior.
I solved this issue in my Java implementation by simply taking the V8 number serializer "as is": https://github.com/cyberphone/openkeystore/tree/master/library/src/org/webpki/json
Related issues in .NET: https://github.com/cyberphone/jsondotnet
Removing "Bug" and adding "Suggestion." This is a suggestion to use a number formatting algorithm consistent with other engines.
Confirmed compat issue still exists:
>eshost --tags jsvu -isx "var v1 = 33333333.3333333; print(v1); var v2 = 0.000000036; print(v2); print(v1 + v2);"
## Source
var v1 = 33333333.3333333; print(v1); var v2 = 0.000000036; print(v2); print(v1 + v2);
#### jsvu-ch
33333333.3333333
3.6e-8
33333333.333333335
#### jsvu-d8, jsvu-jsc, jsvu-sm
33333333.3333333
3.6e-8
33333333.333333336
It might be interesting knowing that a new JSON signature standards proposal based on ES6 normalization is in progress with Microsoft as a co-editor. Unofficial document: https://xml2rfc.tools.ietf.org/cgi-bin/xml2rfc.cgi?url=https%3A%2F%2Fraw.githubusercontent.com%2Ferdtman%2FCleartext-JOSE%2Fmaster%2Fdraft-erdtman-jose-cleartext-jws.xml
Since this behavior breaks the now published https://tools.ietf.org/id/draft-erdtman-jose-cleartext-jws-00.html I would consider re-branding this as a bug.
I'll see if I can contribute a fix for this. Noted:
That would be great!
There is now a 100 million test value file: https://github.com/cyberphone/json-canonicalization/tree/master/dotnet/es6numberserialization
So I've taken a look and identified that in the following case within FDblToRgbFast: https://github.com/Microsoft/ChakraCore/blob/2fb482839c651f4f599c837cdbd01585aec9479e/lib/Common/Common/NumberUtilities_strtod.cpp#L1805-L1814 The result frequently differs from what the other engines produce; the only reliable fix appears to be dropping down to the slow path when you would take this path; the method the other engines use does not simply divide the difference for these cases, nor does it divide and roundup or take the highest or the lowest etc. And the fast path has no additional relevant information that I can see at this point.
With this change - i.e. dropping to the slow path from this case - ChakraCore passes the provided 100,000,000 test cases - though there is a small performance impact as the slow path is now being used significantly more than it was before.
Questions before submitting a PR:
How can we test this?
If the intent is to guarantee that CC will return the same answer as v8 and JSC and SM here just testing 100 cases or even 100,000,000 cases may not do it - there could be a case we miss - additionally have the other engines agreed not to change their algorithm? Currently JSC actually uses a copy of v8's code for this; whilst SM seems to have done something different that has the same effect - but has there been any agreement anywhere to stick to the current mechanism?
Is agreeing with the other engines worth the slow down - it's hard to measure the impact but this forces the slow path in maybe 20% of these cases more than it was used before.
I believe that @irinayat-MS looked into some of these issues recently
Yes, please see https://github.com/nodejs/node-chakracore/issues/498 and #2512 Comments to 498 have the data on perf benchmarks if the fast path is replaced with precise.
The fix only requires dropping to the slow path in the case bHL - bLH > 1
as per my comment above - I assume this won't impact benchmarks as badly?
It will effect only cases that involve the maximum number of digits and not quite all of those cases.
Though I guess this would need testing.
Edit: using the 100 million cases in the provided test file the execution time with/without the fix is measurably different the .toString calls on average are taking twice as long
Edit2: comparing to other engines v8 and Jsc actually appear to be faster than CCs fast path at the moment - though producing the same results as CCs precise/slow path - so this should be "fixable" without a performance cost just may be a bit more involved.
Correction to my comment above to match the other engines the slow path is only required when:
(bHL - bLH > 1 && (bHL + bLH) %2 != 0)
So the change would be to change this: https://github.com/Microsoft/ChakraCore/blob/2fb482839c651f4f599c837cdbd01585aec9479e/lib/Common/Common/NumberUtilities_strtod.cpp#L1805-L1814 To:
else if (bHL - bLH > 1)
{
Assert(ib < kcbMaxRgb);
if(!(ib < kcbMaxRgb))
goto LFail;
// HL and LH differ by at least two in this digit, so split
// the difference if there's an exact mid-point
// if not fall back on the slow path
if ((bHL + bLH) %2 !=0)
{
prgb[ib++] = (bHL + bLH + 1) / 2;
}
else
{
goto LFail;
}
}
I can't make a more narrow condition than this I've tried looking at the values of bHL and BLH at this and comparing against the output number that v8 gets and there is no 1-1 mapping that will work.
Testing over a large number of samples introducing the slow path for these items only seems to cause around a 5-10% slow down on toString(someDouble).
I note that v8 and Jsc are around 5 times faster than CC for toString(someDouble) they seemingly achieve this speed through using various fast maths helpers - including a "Do it yourself floating point" comprised of an int64 base and an int32 exponent - whilst largely doing the same basic calculations.
If this path needs to fast could you just use v8's code for it? (Jsc does) It's available here as a separate BSD3 licensed library: https://github.com/google/double-conversion
@rhuanjl re: large test case -- we could archive this test case as a Slow test we run in our Nightly test suite, rather than every check in, but it's probably not necessary. A sampling of some of the cases that were previously incorrect is probably sufficient to track regressions. Changes to this area could manually verify with the large test, and any other issues will be opened as new bugs.
The code in my previous comment is the fix for this. But I'm going to hold off on submitting anything for now as it seems that some of the code here needs optimising before this fix (which will incur a performance cost) is added.
I will see if I can do something on optimising it but that's going to take more thought and be significantly more involved than the logical fix.
As a possible solution to this, recent versions of ICU incorporate double to string code cut from v8 (https://github.com/unicode-org/icu/tree/master/vendor/double-conversion) - if we update to use a recent version of ICU then builds of CC with ICU could pipe their double to string conversions through that which would fix this issue.
The latest ARM Macs seem to output a different answer with CC's algorithm to other architectures.
Would be good to follow up and fix this, perhaps by bringing in the ICU idea mentioned above.
In order to create signed JavaScript objects, you need to normalize data.
The JCS specification https://cyberphone.github.io/openkeystore/resources/docs/jcs.html#ECMAScript_Compatibility_Mode accomplish this "automatically" by relying on ES6 for property order (which Chakra implements), but also depends on the extended number formatting algorithm used in the Google V8 engine which after testing with 100 million of random and selected values appears to be used by Chrome, Firefox, and Safari as well as "node.js".
Browser test program: http://webpki.org/ietf/es6numberbrowsertest.html
10 million value test file: http://webpki.org/ietf/es6testfile.txt
Although JCS does not (a this stage) represent a standard, the core concept will most likely become a de-facto standard due to its simplicity.