`IsCollinear`: detect and handle integer multiplication wrapping (version 2)

reunanen commented 2 months ago

This is similar to #832, but tries to avoid branching and division.

@here-abarany, what do you think?

here-abarany commented 2 months ago

Looks good to me.

Do note that the C# and Pascal implementations very likely suffer from the same issues.

reunanen commented 2 months ago

Do note that the C# and Pascal implementations very likely suffer from the same issues.

Yes. Hopefully easier to port this than the more complicated implementation.

reunanen commented 2 months ago

@AngusJohnson, what do you think of this alternative?

AngusJohnson commented 2 months ago

@AngusJohnson, what do you think of this alternative?

Juha, I definitely think we're getting much closer, and I really appreciate the effort you've made.

I was a little concerned that these changes would affect performance, but in my testing so far, it doesn't seem to have done so 👍. And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.

So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little 😜.

reunanen commented 2 months ago

And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.

Yeah, and as pointed out by @here-abarany, that kind of branching may very well be more detrimental to performance than the additional instructions.

So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little 😜.

There are also alternative implementations out there, some of which are perhaps a bit clearer. I could try and find you some?

AngusJohnson commented 2 months ago

I could try and find you some?

I hope that won't be necessary 🤞. I just need a bit of time (and a clear head) to work it out 😁. (I'm very reluctant to use code that I don't fully understand.)

Edit: Here's my working understanding of CalculateCarry() in pseudocode ...

to test for a * b overflow carry
given that a & b are type Int64

let:
  aHi = a & 0xFFFFFFFF00000000;
  aLo = a & 0xFFFFFFFF;
  bHi = b & 0xFFFFFFFF00000000;
  bLo = b & 0xFFFFFFFF;
then:  
  a * b == (aHi + aLo) * (bHi + bLo)
        == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
also:
  (aHi * bHi) must be > 64bits, so carry = ((aHi >> 32) * (bHi >> 32))
but:
  (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  so if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry
  or safely ... if ((aLo * bHi) + (aLo * bLo) > (MAX_INT64 - (aHi * bLo))) ++carry

reunanen commented 2 months ago

Oops, accidentally closed this... 🙈

AngusJohnson commented 2 months ago

I'll probably rewrite CalculateCarry so that I can understand what's happening there😁. I just hope this doesn't break things again🤞.

inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
  const uint64_t aHi = a & 0xFFFFFFFF00000000;
  const uint64_t aLo = a & 0xFFFFFFFF;
  const uint64_t bHi = b & 0xFFFFFFFF00000000;
  const uint64_t bLo = b & 0xFFFFFFFF;
  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  uint64_t carry = (aHi >> 32) * (bHi >> 32);
  // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
  if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
  return carry;
}

reunanen commented 2 months ago

I just hope this doesn't break things again🤞.

For what it's worth: I noticed that there's already some serious test automation coverage – in the sense that whenever I made a mistake doing this, it was flagged by at least some (and often multiple) test cases. Which is great! 💪

Of course doesn't mean one should blindly rely on the tests only, but at least simple mistakes will probably be detected right away.

reunanen commented 2 months ago

// do safely ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry

Just a nitpick: that should be MAX_UINT64, right?

reunanen commented 2 months ago

And yes, your version is very readable, now I think I understand this too 😅

You use redundant parentheses more than I would, but that's a matter of taste I guess.

AngusJohnson commented 2 months ago

You use redundant parentheses more than I would, but that's a matter of taste I guess.

I was deliberately heavy handed with parentheses so they would parallel the commented code logic above.

AngusJohnson commented 2 months ago

I'll probably rewrite CalculateCarry so that I can understand what's happening there😁. I just hope this doesn't break things again🤞.

inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
  const uint64_t aHi = a & 0xFFFFFFFF00000000;
  const uint64_t aLo = a & 0xFFFFFFFF;
  const uint64_t bHi = b & 0xFFFFFFFF00000000;
  const uint64_t bLo = b & 0xFFFFFFFF;
  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  uint64_t carry = (aHi >> 32) * (bHi >> 32);
  // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
  if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
  return carry;
}

Actually, looking again more closely, I don't think the code above is correct. And if it is wrong then I believe I was somewhat mislead by the StackOverflow link provided.

I think the following code is correct (or almost correct), but it's bedtime again so I may not be thinking straight ...

  inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
  {
    const uint64_t aHi = a & 0xFFFFFFFF00000000;
    const uint64_t aLo = a & 0xFFFFFFFF;
    const uint64_t bHi = b & 0xFFFFFFFF00000000;
    const uint64_t bLo = b & 0xFFFFFFFF;
    // a * b == (aHi + aLo) * (bHi + bLo)
    // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
    return (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
  }

reunanen commented 2 months ago

return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);

Hmm but won't aHi * bLo and bHi * aLo possibly wrap here?

AngusJohnson commented 2 months ago

return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);

Hmm but won't aHi * bLo and bHi * aLo possibly wrap here?

You just missed my edit. (See above.) Anyhow, this is why I'm so keen to understand the underlying code. I doesn't stop me making mistakes, but hopefully it helps me avoid making a lot more.

  inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
  {
    //const uint64_t aHi = a & 0xFFFFFFFF00000000;
    const uint64_t aHiShr32 = a >> 32;
    const uint64_t aLo = a & 0xFFFFFFFF;
    //const uint64_t bHi = b & 0xFFFFFFFF00000000;
    const uint64_t bHiShr32 = b >> 32;
    const uint64_t bLo = b & 0xFFFFFFFF;
    // a * b == (aHi + aLo) * (bHi + bLo)
    // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
    //carry = (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
    return aHiShr32 * bHiShr32 + ((aHiShr32 * bLo) >> 32) + ((bHiShr32 * aLo) >> 32);
  }

tomwiel commented 2 months ago

Just in case, this one (Emulate64x64to128) looks quite simple too.

AngusJohnson commented 2 months ago

Thanks Tom. You've just reminded me of my own Int128Mul function and Int128 class in Clipper1. I'm very confident that Int128Mul there (line 354) performs correctly.

azrafe7 commented 2 months ago

Hi all, following the discussion and throwing in my two cents... This one https://stackoverflow.com/a/1815371/1158913 seems to behave correctly for me, returning the expected carry.

I've checked against a naive implementation (carry128()) using uint_128 (BASELINE to ensure correct results - also tested in Python) and an adaptation of the solution in the link (multiply_carry()):

typedef unsigned __int128 uint128_t;
uint64_t carry128(uint64_t a, uint64_t b) {
    uint128_t a_128 = a;
    uint128_t res_128 = a_128 * b;
    uint64_t carry = (uint64_t) (res_128 >> 64);

    return carry;
}

uint64_t hi(uint64_t x) {
    return x >> 32;
}

uint64_t lo(uint64_t x) {
    return ((1ULL << 32) - 1) & x;
}

// https://stackoverflow.com/a/1815371/1158913
uint64_t multiply_carry(uint64_t a, uint64_t b) {
    // actually uint32_t would do, but the casting is annoying
    uint64_t s0, s1, s2, s3; 

    uint64_t x = lo(a) * lo(b);
    s0 = lo(x);

    x = hi(a) * lo(b) + hi(x);
    s1 = lo(x);
    s2 = hi(x);

    x = s1 + lo(a) * hi(b);
    s1 = lo(x);

    x = s2 + hi(a) * hi(b) + hi(x);
    s2 = lo(x);
    s3 = hi(x);

    uint64_t result = s1 << 32 | s0;
    uint64_t carry = s3 << 32 | s2;

    return carry;
}

Examples of testing carry returned from multiplication with random numbers:

TEST: 0x51eaed81157de061 * 0x3a271fb2745b6fe9
BASELINE carry128: 0x129bbebdfae0464e
CalculateCarry   : 0x129bbebdd0c2c2b2 WRONG
CalcOverflowCarry: 0x129bbebdfae0464e OK
multiply_carry   : 0x129bbebdfae0464e OK

TEST: 0xc2055706a62883fa * 0x26c78bc79c2322cc
BASELINE carry128: 0x1d640701d192519b
CalculateCarry   : 0x1d6407014210e7ab WRONG
CalcOverflowCarry: 0x1d640701d1925199 WRONG
multiply_carry   : 0x1d640701d192519b OK

More tests...

TEST: 0x874ddae32094b0de * 0x9b1559a06fdf83e0
BASELINE carry128: 0x51f76c49563e5bfe
CalculateCarry   : 0x51f76c490760b8e0 WRONG
CalcOverflowCarry: 0x51f76c49563e5bfd WRONG
multiply_carry   : 0x51f76c49563e5bfe OK

TEST: 0x81fb3ad3636ca900 * 0x239c000a982a8da4
BASELINE carry128: 0x12148e28207b83a3
CalculateCarry   : 0x12148e27c5644c3e WRONG
CalcOverflowCarry: 0x12148e28207b83a2 WRONG
multiply_carry   : 0x12148e28207b83a3 OK

TEST: 0x4be0b4c5d2725c44 * 0x990cd6db34a04c30
BASELINE carry128: 0x2d5d1a4183fd6165
CalculateCarry   : 0x2d5d1a40f6935288 WRONG
CalcOverflowCarry: 0x2d5d1a4183fd6164 WRONG
multiply_carry   : 0x2d5d1a4183fd6165 OK

TEST: 0x978ec0c0433c01f6 * 0x2df03d097966b536
BASELINE carry128: 0x1b3251d91fe272a5
CalculateCarry   : 0x1b3251d8cbf286c1 WRONG
CalcOverflowCarry: 0x1b3251d91fe272a4 WRONG
multiply_carry   : 0x1b3251d91fe272a5 OK

TEST: 0x49c5cbbcfd716344 * 0xc489e3b34b007ad3
BASELINE carry128: 0x38a32c74c8c191a4
CalculateCarry   : 0x38a32c73f0912874 WRONG
CalcOverflowCarry: 0x38a32c74c8c191a3 WRONG
multiply_carry   : 0x38a32c74c8c191a4 OK

TEST: 0xd3361cdbeed655d5 * 0x1240da41e324953a
BASELINE carry128: 0x0f0f4fa11e7e8f2a
CalculateCarry   : 0x0f0f4fa0520fd19c WRONG
CalcOverflowCarry: 0x0f0f4fa11e7e8f28 WRONG
multiply_carry   : 0x0f0f4fa11e7e8f2a OK

TEST: 0x51b854f8e71b0ae0 * 0x6f8d438aae530af5
BASELINE carry128: 0x239c04ee3c8cc248
CalculateCarry   : 0x239c04eda032b5b0 WRONG
CalcOverflowCarry: 0x239c04ee3c8cc247 WRONG
multiply_carry   : 0x239c04ee3c8cc248 OK

TEST: 0xbbecf7dbc6147480 * 0xbb0f73d0f82e2236
BASELINE carry128: 0x895170f4e9a216a7
CalculateCarry   : 0x895170f3a2b5c2f1 WRONG
CalcOverflowCarry: 0x895170f4e9a216a5 WRONG
multiply_carry   : 0x895170f4e9a216a7 OK

WRONG (of 10 tries): 
  CalculateCarry   : 10
  CalcOverflowCarry: 9
  multiply_carry   : 0

Hope it's helpful.

AngusJohnson commented 2 months ago

Thanks Giuseppe. StackOverflow is a fantastic resource, but the code there isn't always reliable (even from contrubutors with very high reputations). Anyhow, I've just updated clipper.core.h again with a revised CalcOverflowCarry function.

AngusJohnson / Clipper2

`IsCollinear`: detect and handle integer multiplication wrapping (version 2) #834