Closed reunanen closed 2 months ago
Looks good to me.
Do note that the C# and Pascal implementations very likely suffer from the same issues.
Do note that the C# and Pascal implementations very likely suffer from the same issues.
Yes. Hopefully easier to port this than the more complicated implementation.
@AngusJohnson, what do you think of this alternative?
@AngusJohnson, what do you think of this alternative?
Juha, I definitely think we're getting much closer, and I really appreciate the effort you've made.
I was a little concerned that these changes would affect performance, but in my testing so far, it doesn't seem to have done so ๐. And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.
So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little ๐.
And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.
Yeah, and as pointed out by @here-abarany, that kind of branching may very well be more detrimental to performance than the additional instructions.
So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little ๐.
There are also alternative implementations out there, some of which are perhaps a bit clearer. I could try and find you some?
I could try and find you some?
I hope that won't be necessary ๐ค. I just need a bit of time (and a clear head) to work it out ๐. (I'm very reluctant to use code that I don't fully understand.)
Edit: Here's my working understanding of CalculateCarry() in pseudocode ...
to test for a * b overflow carry
given that a & b are type Int64
let:
aHi = a & 0xFFFFFFFF00000000;
aLo = a & 0xFFFFFFFF;
bHi = b & 0xFFFFFFFF00000000;
bLo = b & 0xFFFFFFFF;
then:
a * b == (aHi + aLo) * (bHi + bLo)
== (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
also:
(aHi * bHi) must be > 64bits, so carry = ((aHi >> 32) * (bHi >> 32))
but:
(aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow
so if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry
or safely ... if ((aLo * bHi) + (aLo * bLo) > (MAX_INT64 - (aHi * bLo))) ++carry
Oops, accidentally closed this... ๐
I'll probably rewrite CalculateCarry
so that I can understand what's happening there๐.
I just hope this doesn't break things again๐ค.
inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
const uint64_t aHi = a & 0xFFFFFFFF00000000;
const uint64_t aLo = a & 0xFFFFFFFF;
const uint64_t bHi = b & 0xFFFFFFFF00000000;
const uint64_t bLo = b & 0xFFFFFFFF;
// a * b == (aHi + aLo) * (bHi + bLo)
// a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
uint64_t carry = (aHi >> 32) * (bHi >> 32);
// but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow
// do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
return carry;
}
I just hope this doesn't break things again๐ค.
For what it's worth: I noticed that there's already some serious test automation coverage โ in the sense that whenever I made a mistake doing this, it was flagged by at least some (and often multiple) test cases. Which is great! ๐ช
Of course doesn't mean one should blindly rely on the tests only, but at least simple mistakes will probably be detected right away.
// do safely ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry
Just a nitpick: that should be MAX_UINT64
, right?
And yes, your version is very readable, now I think I understand this too ๐
You use redundant parentheses more than I would, but that's a matter of taste I guess.
You use redundant parentheses more than I would, but that's a matter of taste I guess.
I was deliberately heavy handed with parentheses so they would parallel the commented code logic above.
I'll probably rewrite
CalculateCarry
so that I can understand what's happening there๐. I just hope this doesn't break things again๐ค.inline uint64_t CalculateCarry(uint64_t a, uint64_t b) { const uint64_t aHi = a & 0xFFFFFFFF00000000; const uint64_t aLo = a & 0xFFFFFFFF; const uint64_t bHi = b & 0xFFFFFFFF00000000; const uint64_t bLo = b & 0xFFFFFFFF; // a * b == (aHi + aLo) * (bHi + bLo) // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo) uint64_t carry = (aHi >> 32) * (bHi >> 32); // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry; return carry; }
Actually, looking again more closely, I don't think the code above is correct. And if it is wrong then I believe I was somewhat mislead by the StackOverflow link provided.
I think the following code is correct (or almost correct), but it's bedtime again so I may not be thinking straight ...
inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
{
const uint64_t aHi = a & 0xFFFFFFFF00000000;
const uint64_t aLo = a & 0xFFFFFFFF;
const uint64_t bHi = b & 0xFFFFFFFF00000000;
const uint64_t bLo = b & 0xFFFFFFFF;
// a * b == (aHi + aLo) * (bHi + bLo)
// a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
return (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
}
return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);
Hmm but won't aHi * bLo
and bHi * aLo
possibly wrap here?
return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);
Hmm but won't
aHi * bLo
andbHi * aLo
possibly wrap here?
You just missed my edit. (See above.) Anyhow, this is why I'm so keen to understand the underlying code. I doesn't stop me making mistakes, but hopefully it helps me avoid making a lot more.
inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
{
//const uint64_t aHi = a & 0xFFFFFFFF00000000;
const uint64_t aHiShr32 = a >> 32;
const uint64_t aLo = a & 0xFFFFFFFF;
//const uint64_t bHi = b & 0xFFFFFFFF00000000;
const uint64_t bHiShr32 = b >> 32;
const uint64_t bLo = b & 0xFFFFFFFF;
// a * b == (aHi + aLo) * (bHi + bLo)
// a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
//carry = (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
return aHiShr32 * bHiShr32 + ((aHiShr32 * bLo) >> 32) + ((bHiShr32 * aLo) >> 32);
}
Just in case, this one (Emulate64x64to128) looks quite simple too.
Thanks Tom. You've just reminded me of my own Int128Mul
function and Int128 class in Clipper1
. I'm very confident that Int128Mul
there (line 354) performs correctly.
Hi all, following the discussion and throwing in my two cents... This one https://stackoverflow.com/a/1815371/1158913 seems to behave correctly for me, returning the expected carry.
I've checked against a naive implementation (carry128()
) using uint_128 (BASELINE to ensure correct results - also tested in Python) and an adaptation of the solution in the link (multiply_carry()
):
typedef unsigned __int128 uint128_t;
uint64_t carry128(uint64_t a, uint64_t b) {
uint128_t a_128 = a;
uint128_t res_128 = a_128 * b;
uint64_t carry = (uint64_t) (res_128 >> 64);
return carry;
}
uint64_t hi(uint64_t x) {
return x >> 32;
}
uint64_t lo(uint64_t x) {
return ((1ULL << 32) - 1) & x;
}
// https://stackoverflow.com/a/1815371/1158913
uint64_t multiply_carry(uint64_t a, uint64_t b) {
// actually uint32_t would do, but the casting is annoying
uint64_t s0, s1, s2, s3;
uint64_t x = lo(a) * lo(b);
s0 = lo(x);
x = hi(a) * lo(b) + hi(x);
s1 = lo(x);
s2 = hi(x);
x = s1 + lo(a) * hi(b);
s1 = lo(x);
x = s2 + hi(a) * hi(b) + hi(x);
s2 = lo(x);
s3 = hi(x);
uint64_t result = s1 << 32 | s0;
uint64_t carry = s3 << 32 | s2;
return carry;
}
Examples of testing carry returned from multiplication with random numbers:
TEST: 0x51eaed81157de061 * 0x3a271fb2745b6fe9 BASELINE carry128: 0x129bbebdfae0464e CalculateCarry : 0x129bbebdd0c2c2b2 WRONG CalcOverflowCarry: 0x129bbebdfae0464e OK multiply_carry : 0x129bbebdfae0464e OK TEST: 0xc2055706a62883fa * 0x26c78bc79c2322cc BASELINE carry128: 0x1d640701d192519b CalculateCarry : 0x1d6407014210e7ab WRONG CalcOverflowCarry: 0x1d640701d1925199 WRONG multiply_carry : 0x1d640701d192519b OK
TEST: 0x874ddae32094b0de * 0x9b1559a06fdf83e0 BASELINE carry128: 0x51f76c49563e5bfe CalculateCarry : 0x51f76c490760b8e0 WRONG CalcOverflowCarry: 0x51f76c49563e5bfd WRONG multiply_carry : 0x51f76c49563e5bfe OK
TEST: 0x81fb3ad3636ca900 * 0x239c000a982a8da4 BASELINE carry128: 0x12148e28207b83a3 CalculateCarry : 0x12148e27c5644c3e WRONG CalcOverflowCarry: 0x12148e28207b83a2 WRONG multiply_carry : 0x12148e28207b83a3 OK
TEST: 0x4be0b4c5d2725c44 * 0x990cd6db34a04c30 BASELINE carry128: 0x2d5d1a4183fd6165 CalculateCarry : 0x2d5d1a40f6935288 WRONG CalcOverflowCarry: 0x2d5d1a4183fd6164 WRONG multiply_carry : 0x2d5d1a4183fd6165 OK
TEST: 0x978ec0c0433c01f6 * 0x2df03d097966b536 BASELINE carry128: 0x1b3251d91fe272a5 CalculateCarry : 0x1b3251d8cbf286c1 WRONG CalcOverflowCarry: 0x1b3251d91fe272a4 WRONG multiply_carry : 0x1b3251d91fe272a5 OK
TEST: 0x49c5cbbcfd716344 * 0xc489e3b34b007ad3 BASELINE carry128: 0x38a32c74c8c191a4 CalculateCarry : 0x38a32c73f0912874 WRONG CalcOverflowCarry: 0x38a32c74c8c191a3 WRONG multiply_carry : 0x38a32c74c8c191a4 OK
TEST: 0xd3361cdbeed655d5 * 0x1240da41e324953a BASELINE carry128: 0x0f0f4fa11e7e8f2a CalculateCarry : 0x0f0f4fa0520fd19c WRONG CalcOverflowCarry: 0x0f0f4fa11e7e8f28 WRONG multiply_carry : 0x0f0f4fa11e7e8f2a OK
TEST: 0x51b854f8e71b0ae0 * 0x6f8d438aae530af5 BASELINE carry128: 0x239c04ee3c8cc248 CalculateCarry : 0x239c04eda032b5b0 WRONG CalcOverflowCarry: 0x239c04ee3c8cc247 WRONG multiply_carry : 0x239c04ee3c8cc248 OK
TEST: 0xbbecf7dbc6147480 * 0xbb0f73d0f82e2236 BASELINE carry128: 0x895170f4e9a216a7 CalculateCarry : 0x895170f3a2b5c2f1 WRONG CalcOverflowCarry: 0x895170f4e9a216a5 WRONG multiply_carry : 0x895170f4e9a216a7 OK
WRONG (of 10 tries): CalculateCarry : 10 CalcOverflowCarry: 9 multiply_carry : 0
Hope it's helpful.
Thanks Giuseppe.
StackOverflow is a fantastic resource, but the code there isn't always reliable (even from contrubutors with very high reputations).
Anyhow, I've just updated clipper.core.h again with a revised CalcOverflowCarry
function.
This is similar to #832, but tries to avoid branching and division.
@here-abarany, what do you think?