AngusJohnson / Clipper2

Polygon Clipping and Offsetting - C++, C# and Delphi
Boost Software License 1.0
1.34k stars 248 forks source link

`IsCollinear`: detect and handle integer multiplication wrapping (version 2) #834

Closed reunanen closed 2 months ago

reunanen commented 2 months ago

This is similar to #832, but tries to avoid branching and division.

@here-abarany, what do you think?

here-abarany commented 2 months ago

Looks good to me.

Do note that the C# and Pascal implementations very likely suffer from the same issues.

reunanen commented 2 months ago

Do note that the C# and Pascal implementations very likely suffer from the same issues.

Yes. Hopefully easier to port this than the more complicated implementation.

reunanen commented 2 months ago

@AngusJohnson, what do you think of this alternative?

AngusJohnson commented 2 months ago

@AngusJohnson, what do you think of this alternative?

Juha, I definitely think we're getting much closer, and I really appreciate the effort you've made.

I was a little concerned that these changes would affect performance, but in my testing so far, it doesn't seem to have done so ๐Ÿ‘. And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.

So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little ๐Ÿ˜œ.

reunanen commented 2 months ago

And I was wondering if it would be better to avoid calling ProductIsEqual when none of the upper 32bits in a,b,c,d are set (which I believe would be likely for most users). But given that calling ProductIsEqual has negligible impact on performance, that additional check probably wouldn't be necessary.

Yeah, and as pointed out by @here-abarany, that kind of branching may very well be more detrimental to performance than the additional instructions.

So really the only (temporary) reservation I have is getting my head around the logic in CalculateCarry(). Thanks for the link there which is helping a little ๐Ÿ˜œ.

There are also alternative implementations out there, some of which are perhaps a bit clearer. I could try and find you some?

AngusJohnson commented 2 months ago

I could try and find you some?

I hope that won't be necessary ๐Ÿคž. I just need a bit of time (and a clear head) to work it out ๐Ÿ˜. (I'm very reluctant to use code that I don't fully understand.)

Edit: Here's my working understanding of CalculateCarry() in pseudocode ...

to test for a * b overflow carry
given that a & b are type Int64

let:
  aHi = a & 0xFFFFFFFF00000000;
  aLo = a & 0xFFFFFFFF;
  bHi = b & 0xFFFFFFFF00000000;
  bLo = b & 0xFFFFFFFF;
then:  
  a * b == (aHi + aLo) * (bHi + bLo)
        == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
also:
  (aHi * bHi) must be > 64bits, so carry = ((aHi >> 32) * (bHi >> 32))
but:
  (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  so if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry
  or safely ... if ((aLo * bHi) + (aLo * bLo) > (MAX_INT64 - (aHi * bLo))) ++carry
reunanen commented 2 months ago

Oops, accidentally closed this... ๐Ÿ™ˆ

AngusJohnson commented 2 months ago

I'll probably rewrite CalculateCarry so that I can understand what's happening there๐Ÿ˜. I just hope this doesn't break things again๐Ÿคž.

inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
  const uint64_t aHi = a & 0xFFFFFFFF00000000;
  const uint64_t aLo = a & 0xFFFFFFFF;
  const uint64_t bHi = b & 0xFFFFFFFF00000000;
  const uint64_t bLo = b & 0xFFFFFFFF;
  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  uint64_t carry = (aHi >> 32) * (bHi >> 32);
  // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
  if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
  return carry;
}
reunanen commented 2 months ago

I just hope this doesn't break things again๐Ÿคž.

For what it's worth: I noticed that there's already some serious test automation coverage โ€“ in the sense that whenever I made a mistake doing this, it was flagged by at least some (and often multiple) test cases. Which is great! ๐Ÿ’ช

Of course doesn't mean one should blindly rely on the tests only, but at least simple mistakes will probably be detected right away.

reunanen commented 2 months ago

// do safely ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_INT64) ++carry

Just a nitpick: that should be MAX_UINT64, right?

reunanen commented 2 months ago

And yes, your version is very readable, now I think I understand this too ๐Ÿ˜…

You use redundant parentheses more than I would, but that's a matter of taste I guess.

AngusJohnson commented 2 months ago

You use redundant parentheses more than I would, but that's a matter of taste I guess.

I was deliberately heavy handed with parentheses so they would parallel the commented code logic above.

AngusJohnson commented 2 months ago

I'll probably rewrite CalculateCarry so that I can understand what's happening there๐Ÿ˜. I just hope this doesn't break things again๐Ÿคž.

inline uint64_t CalculateCarry(uint64_t a, uint64_t b)
{
  const uint64_t aHi = a & 0xFFFFFFFF00000000;
  const uint64_t aLo = a & 0xFFFFFFFF;
  const uint64_t bHi = b & 0xFFFFFFFF00000000;
  const uint64_t bLo = b & 0xFFFFFFFF;
  // a * b == (aHi + aLo) * (bHi + bLo)
  // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
  uint64_t carry = (aHi >> 32) * (bHi >> 32);
  // but since (aHi * bLo) + (aLo * bHi) + (aLo * bLo) may also (just) overflow 
  // do this (but safely) ... if ((aHi * bLo) + (aLo * bHi) + (aLo * bLo) > MAX_UINT64) ++carry
  if ((aLo * bHi) + (aLo * bLo) > (std::numeric_limits<uint64_t>::max() - (aHi * bLo))) ++carry;
  return carry;
}

Actually, looking again more closely, I don't think the code above is correct. And if it is wrong then I believe I was somewhat mislead by the StackOverflow link provided.

I think the following code is correct (or almost correct), but it's bedtime again so I may not be thinking straight ...

  inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
  {
    const uint64_t aHi = a & 0xFFFFFFFF00000000;
    const uint64_t aLo = a & 0xFFFFFFFF;
    const uint64_t bHi = b & 0xFFFFFFFF00000000;
    const uint64_t bLo = b & 0xFFFFFFFF;
    // a * b == (aHi + aLo) * (bHi + bLo)
    // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
    return (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
  }
reunanen commented 2 months ago

return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);

Hmm but won't aHi * bLo and bHi * aLo possibly wrap here?

AngusJohnson commented 2 months ago

return (aHi >> 32) * (bHi >> 32) + ((aHi * bLo) >> 32) + ((bHi * aLo) >> 32);

Hmm but won't aHi * bLo and bHi * aLo possibly wrap here?

You just missed my edit. (See above.) Anyhow, this is why I'm so keen to understand the underlying code. I doesn't stop me making mistakes, but hopefully it helps me avoid making a lot more.

  inline uint64_t CalcOverflowCarry(uint64_t a, uint64_t b)
  {
    //const uint64_t aHi = a & 0xFFFFFFFF00000000;
    const uint64_t aHiShr32 = a >> 32;
    const uint64_t aLo = a & 0xFFFFFFFF;
    //const uint64_t bHi = b & 0xFFFFFFFF00000000;
    const uint64_t bHiShr32 = b >> 32;
    const uint64_t bLo = b & 0xFFFFFFFF;
    // a * b == (aHi + aLo) * (bHi + bLo)
    // a * b == (aHi * bHi) + (aHi * bLo) + (aLo * bHi) + (aLo * bLo)
    //carry = (aHi >> 32) * (bHi >> 32) + (((aHi >> 32) * bLo) >> 32) + (((bHi >> 32) * aLo) >> 32);
    return aHiShr32 * bHiShr32 + ((aHiShr32 * bLo) >> 32) + ((bHiShr32 * aLo) >> 32);
  }
tomwiel commented 2 months ago

Just in case, this one (Emulate64x64to128) looks quite simple too.

AngusJohnson commented 2 months ago

Thanks Tom. You've just reminded me of my own Int128Mul function and Int128 class in Clipper1. I'm very confident that Int128Mul there (line 354) performs correctly.

azrafe7 commented 2 months ago

Hi all, following the discussion and throwing in my two cents... This one https://stackoverflow.com/a/1815371/1158913 seems to behave correctly for me, returning the expected carry.

I've checked against a naive implementation (carry128()) using uint_128 (BASELINE to ensure correct results - also tested in Python) and an adaptation of the solution in the link (multiply_carry()):

typedef unsigned __int128 uint128_t;
uint64_t carry128(uint64_t a, uint64_t b) {
    uint128_t a_128 = a;
    uint128_t res_128 = a_128 * b;
    uint64_t carry = (uint64_t) (res_128 >> 64);

    return carry;
}

uint64_t hi(uint64_t x) {
    return x >> 32;
}

uint64_t lo(uint64_t x) {
    return ((1ULL << 32) - 1) & x;
}

// https://stackoverflow.com/a/1815371/1158913
uint64_t multiply_carry(uint64_t a, uint64_t b) {
    // actually uint32_t would do, but the casting is annoying
    uint64_t s0, s1, s2, s3; 

    uint64_t x = lo(a) * lo(b);
    s0 = lo(x);

    x = hi(a) * lo(b) + hi(x);
    s1 = lo(x);
    s2 = hi(x);

    x = s1 + lo(a) * hi(b);
    s1 = lo(x);

    x = s2 + hi(a) * hi(b) + hi(x);
    s2 = lo(x);
    s3 = hi(x);

    uint64_t result = s1 << 32 | s0;
    uint64_t carry = s3 << 32 | s2;

    return carry;
}

Examples of testing carry returned from multiplication with random numbers:

TEST: 0x51eaed81157de061 * 0x3a271fb2745b6fe9
BASELINE carry128: 0x129bbebdfae0464e
CalculateCarry   : 0x129bbebdd0c2c2b2 WRONG
CalcOverflowCarry: 0x129bbebdfae0464e OK
multiply_carry   : 0x129bbebdfae0464e OK

TEST: 0xc2055706a62883fa * 0x26c78bc79c2322cc
BASELINE carry128: 0x1d640701d192519b
CalculateCarry   : 0x1d6407014210e7ab WRONG
CalcOverflowCarry: 0x1d640701d1925199 WRONG
multiply_carry   : 0x1d640701d192519b OK

More tests...
TEST: 0x874ddae32094b0de * 0x9b1559a06fdf83e0
BASELINE carry128: 0x51f76c49563e5bfe
CalculateCarry   : 0x51f76c490760b8e0 WRONG
CalcOverflowCarry: 0x51f76c49563e5bfd WRONG
multiply_carry   : 0x51f76c49563e5bfe OK
TEST: 0x81fb3ad3636ca900 * 0x239c000a982a8da4
BASELINE carry128: 0x12148e28207b83a3
CalculateCarry   : 0x12148e27c5644c3e WRONG
CalcOverflowCarry: 0x12148e28207b83a2 WRONG
multiply_carry   : 0x12148e28207b83a3 OK

TEST: 0x4be0b4c5d2725c44 * 0x990cd6db34a04c30
BASELINE carry128: 0x2d5d1a4183fd6165
CalculateCarry   : 0x2d5d1a40f6935288 WRONG
CalcOverflowCarry: 0x2d5d1a4183fd6164 WRONG
multiply_carry   : 0x2d5d1a4183fd6165 OK

TEST: 0x978ec0c0433c01f6 * 0x2df03d097966b536
BASELINE carry128: 0x1b3251d91fe272a5
CalculateCarry   : 0x1b3251d8cbf286c1 WRONG
CalcOverflowCarry: 0x1b3251d91fe272a4 WRONG
multiply_carry   : 0x1b3251d91fe272a5 OK

TEST: 0x49c5cbbcfd716344 * 0xc489e3b34b007ad3
BASELINE carry128: 0x38a32c74c8c191a4
CalculateCarry   : 0x38a32c73f0912874 WRONG
CalcOverflowCarry: 0x38a32c74c8c191a3 WRONG
multiply_carry   : 0x38a32c74c8c191a4 OK

TEST: 0xd3361cdbeed655d5 * 0x1240da41e324953a
BASELINE carry128: 0x0f0f4fa11e7e8f2a
CalculateCarry   : 0x0f0f4fa0520fd19c WRONG
CalcOverflowCarry: 0x0f0f4fa11e7e8f28 WRONG
multiply_carry   : 0x0f0f4fa11e7e8f2a OK

TEST: 0x51b854f8e71b0ae0 * 0x6f8d438aae530af5
BASELINE carry128: 0x239c04ee3c8cc248
CalculateCarry   : 0x239c04eda032b5b0 WRONG
CalcOverflowCarry: 0x239c04ee3c8cc247 WRONG
multiply_carry   : 0x239c04ee3c8cc248 OK

TEST: 0xbbecf7dbc6147480 * 0xbb0f73d0f82e2236
BASELINE carry128: 0x895170f4e9a216a7
CalculateCarry   : 0x895170f3a2b5c2f1 WRONG
CalcOverflowCarry: 0x895170f4e9a216a5 WRONG
multiply_carry   : 0x895170f4e9a216a7 OK

WRONG (of 10 tries): 
  CalculateCarry   : 10
  CalcOverflowCarry: 9
  multiply_carry   : 0

Hope it's helpful.

AngusJohnson commented 2 months ago

Thanks Giuseppe. StackOverflow is a fantastic resource, but the code there isn't always reliable (even from contrubutors with very high reputations). Anyhow, I've just updated clipper.core.h again with a revised CalcOverflowCarry function.