Closed kintel closed 7 months ago
Update: I wrote the above late last night. With a clearer head, I realize that this isn't a collinearity check failure, it's a straight-up union failure. The result of unioning two neighboring triangles should be a single contour.
I can't duplicate this problem.
Paths64 subject;
subject.push_back(MakePath({ 0, -453054451,0, -433253797,-455550000, 0 }));
subject.push_back(MakePath({ 0, -433253797,0, 0,-455550000, 0 }));
Clipper64 clipper;
clipper.PreserveCollinear(false);
clipper.AddSubject(subject);
Paths64 solution;
clipper.Execute(ClipType::Union, FillRule::NonZero, solution);
std::cout << solution << std::endl;
0,0, -455550000,0, 0,-453054451
Thanks for feedback. I've only tested this on macOS aarch64. I'll try on x86 Linux as well in case this differs based on arch, OS or compiler.
Works on x86 Linux, fails on M1 macOS:
std::cout << solution << std::endl;
gives me:
0,-433253797,0, -455550000,0,0, 0,-453054451,0
0,0,0, -455550000,0,0, 0,-433253797,0
PS! I see that GitHub now has a macos-14
CI runner in Beta, which runs on M1. Haven't tested it, but it could be a quick path to setting up a repro case if you don't have access to a physical Mac. Let me know if I can assist with anything.
Just my 5 cents, maybe this can help you somehow, because I have some experience in working with M1 macs, here are some potential differencies that can cause troubles:
Mac M1: std::cos(-1.5446728605250191) = 0.026120495107843492
Mac x64: std::cos(-1.5446728605250191) = 0.026120495107843495
(Sidenote: std::sqrt is producing perfectly equal results on all sane platforms)
In your place I would first try to find trigonometric function that introduces difference, but I am not sure if clipper uses any for simple clipping (I believe it uses sin and cos for offsetting, but we are not talking about offsetting). Second option is to try adding -ffp-contract=off and checking if issue goes away. If it does - some multiply-add operation is producing slightly different result and this reveals a bug.
ADDED: Forgot to mention, most of the time all issues except first two are reproducible on any CPU and OS with Clang + libc++ build. You can try reproducing it like this as well, maybe it will work.
Also, while discussion robustness across CPUs and compilers, it would be interesting to do a few things:
Also, another thing: On MacOS FP settings of the parent thread (rounding, exception mask) are not inherited by the new std::thread, I believe this is also implementation detail of libc++. @kintel Maybe you are calling clipper from a new thread and your rounding is not properly configured?
This issue fails on Clipper's unit tests when adding a new test case, so this shouldn't be related to threading issues.
Yep, -ffp-contract=off hides the issue. Just tested it on M1 mac at work, with the latest AppleClang :)
This fails:
cmake ../ -GNinja -DCMAKE_BUILD_TYPE=Debug && ninja ClipperTests && ./ClipperTests --gtest_filter=Clipper2Tests.TestUnionIssue
This succeeds:
cmake ../ -GNinja -DCMAKE_CXX_FLAGS="-ffp-contract=off" -DCMAKE_BUILD_TYPE=Debug && ninja ClipperTests && ./ClipperTests --gtest_filter=Clipper2Tests.TestUnionIssue
Maybe it will be reproducible with a normal Clang on ARM cpu, but I could not reproduce it with Clang 16 on x64 CPU.
So it is fnmsub instruction in M1 build that is causing a difference, and it is the CrossProduct function that ends up having this opcode. I added some debug info to it and started the test:
template <typename T>
inline double CrossProduct(const Point<T>& pt1, const Point<T>& pt2, const Point<T>& pt3) {
const double d = (static_cast<double>(pt2.x - pt1.x) * static_cast<double>(pt3.y -
pt2.y) - static_cast<double>(pt2.y - pt1.y) * static_cast<double>(pt3.x - pt2.x));
std::cout.precision(20);
std::cout << "Inputs:" << std::endl;
std::cout << pt1.x << ", " << pt1.y << std::endl;
std::cout << pt2.x << ", " << pt2.y << std::endl;
std::cout << pt3.x << ", " << pt3.y << std::endl;
std::cout << "Cross product:" << std::endl;
std::cout << d << std::endl;
return d;
}
And here is the difference between M1 build (on the left side) and AMD64 build (on the right) https://www.diffchecker.com/kaHSyAnl/
I guess to reproduce this problem on a normal AMD64 Angus can try to use those results from the left side of the diff.
I was wondering why this was not defaulting to =off and found this: https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html#floating-point-support-in-clang
Better way to reproduce issue on non-M1 mac, somehow with std::fma different CPUs are producing the same result and it starts to fail on AMD64 the same was as on M1 mac. Magic.
template <typename T>
inline double CrossProduct(const Point<T>& pt1, const Point<T>& pt2, const Point<T>& pt3) {
const auto a = static_cast<double>(pt2.x - pt1.x);
const auto b = static_cast<double>(pt3.y - pt2.y);
const auto c = static_cast<double>(pt2.y - pt1.y);
const auto d = static_cast<double>(pt3.x - pt2.x);
//const double e = (a * b - c * d);
const double e = std::fma(a, b, -c * d);
std::cout.precision(20);
std::cout << "Inputs:" << std::endl;
std::cout << pt1.x << ", " << pt1.y << std::endl;
std::cout << pt2.x << ", " << pt2.y << std::endl;
std::cout << pt3.x << ", " << pt3.y << std::endl;
std::cout << "Cross product:" << std::endl;
std::cout << e << std::endl;
return e;
}
I believe there are two separate questions: 1) Clipper produces incorrect result because of FP contractions, but can this issue show up on slightly different input without FP contractions? 2) Should clipper try to produce the same results on different CPUs? In this case we have to either recommend people to turn off fp contractions for ARM cpus (This is what we do at work), or manually go over all fmadd and fnmsub instructions (~20 in total currently) and force corresponding expressions in code to use std::fma to prevent optimizer from using fmadd and fnmsub only on ARM cpus.
In terms of 2.: Yes, that would be nice, especially in terms of Clipper being used as part of a CAD kernel. Having to worry about on which CPU a design is evaluated would be suboptimal.
Thanks for all the suggestions. However, please note that
So with that preamble, am I right in understanding that it's the CrossProduct
function that tests for collinearity is problematic on some Mac machines?
If this is so, I suggest that there's a weakness in the Mac float math routines since Clipper CrossProduct
testing of two identical segments should return 0. The math in the CrossProduct
function is very simple - 5 subtractions & 2 multiplications - though admittedly there are also the type conversions from int64 to double (that avoids potential integer overflow). Clipper could only avoid the safety of this type conversion by either by limiting coordinate ranges (x & y values between +/-2,147,483,648) or by introducing 128bit math where the latter would significantly slow clipping.
In terms of 3. - it would be immensely beneficial with some sort of promise for certain, severely limited, coordinate ranges.
@olologin dug a bit deeper than me, but I want to highlight two things:
- In my failing example, the coordinate values were well within the mentioned range: [-450M, 0] in my case.
It would significantly reduce performance if Clipper were to test coordinate ranges prior to clipping (in order to use different functions depending on those ranges). This isn't something I'm currently prepared to consider.
- Clipper1 works perfectly for this example, using a similar range of values.
Clipper1 resorts to 128bit math which does significantly impede performance.
It would significantly reduce performance if Clipper were to test coordinate ranges prior to clipping (in order to use different functions depending on those ranges). This isn't something I'm currently prepared to consider.
Understandable :) I had hoped that keeping my values below 32 bits would buy me some space for temporary results fitting within int64.
Interestingly, if I shave off 1 more bit and limit my coordinates to 30-bit space, the union operation works. In the event that this compiler/CPU issue doesn't get resolved, perhaps a workaround could be to give some recommendation on coordinate ranges. I see the current recommendation is to stay within ~10^15. Staying within 10^9 seems to work in this scenario.
Anyway, I expect more investigation is needed before concluding anything.
@kintel I wonder if defining CLIPPER2_HI_PRECISION helps
Turning on CLIPPER2_HI_PRECISION
yields the same failure, and also makes Clipper2Tests.TestMultiplePolygons
fail.
Note: This option is technically implemented by the CMakeLists, but doesn't work: It doesn't correctly pass on the boolean value to the macro, and also doesn't pass it on to clients linking with Clipper (like the tests), but I tested by simply #if 1
the code path in question.
Perhaps one way around this issue might be using the following untested code.
Instead of directly calling CrossProduct()
to test for collinearity do the following:
inline bool IsCollinear(const Point64& pt1, const Point64& sharedPt, const Point64& pt2)
{
#ifdef __APPLE__
double cp = CrossProduct(pt1, sharedPt, pt2);
return std::fabs(cp) < 0.00000001;
#else
return CrossProduct(pt1, sharedPt, pt2) == 0;
#endif
}
One comment: We need more tests, but my understanding based on the above discussion is that this is a compiler issue (clang) for a specific instruction set (aarch64), so we should really check for that, not __APPLE__
. Perhaps just __aarch64__
is a good start.
One comment: We need more tests, but my understanding based on the above discussion is that this is a compiler issue (clang) for a specific instruction set (aarch64), so we should really check for that, not
__APPLE__
. Perhaps just__aarch64__
is a good start.
I'm happy to change __APPLE__
to __aarch64__
.
HOwever, I'm having to rely on others to test this, since this isn't an issue with the Visual Studio compiler I'm using.
Evidently, according to your OP above, your compiler (and perhaps others too) will return only a near zero (float) value when assessing collinearity of the following vertices:
{0, -453054451}, {0, -433253797}, {0, 0}
and/or
{0, 0}, {-455550000, 0}, {-455550000, 0}
I'll try two things:
Until then, let's consider this closed.
Just my 2 cents: I wouldn't say it is a compiler or hardware issue. I think it is computation stability problem. As I have shown before, you can reproduce this on x86 if you use std::fma which gives slightly different result. Nothing prevents future versions of compilers from rearranging FP instructions in a way that will produce results similar to std::fma.
Although, knowing how hard it is to estimate precision and write computationally stable code, I cannot really blame Angus for closing this :) It might be easier to wait until it is hit by someone on much more common configurations.
Oh, right, I forgot about that detail. Perhaps the ideal solution (if anyone has time to build and test that) would be to perform a compiler check before building and adjust the code accordingly?
This issue has reappeared in the latest commit, and I can't see why it has done so. Since I don't have any Apple devices I can't realistically debug it, so perhaps someone who does can shed light on this.
@AngusJohnson Did you try to reproduce it like this: https://github.com/AngusJohnson/Clipper2/issues/777#issuecomment-1954339748 ? Because it was failing on x86 after such change the same way as on MacOS
@AngusJohnson Did you try to reproduce it like this: [#777 (comment)]
No I didn't. Sorry. And looking at this problem again I can now see a very simple solution.
inline bool IsCollinear(const Point64& pt1,
const Point64& sharedPt, const Point64& pt2) // #777
{
const auto a = static_cast<double>(sharedPt.x - pt1.x);
const auto b = static_cast<double>(pt2.y - sharedPt.y);
const auto c = static_cast<double>(sharedPt.y - pt1.y);
const auto d = static_cast<double>(pt2.x - sharedPt.x);
return a * b == c * d;
}
Thanks! I'll resume my clipper2 migration soon and will report back.
Btw, it's possible to ssh into GitHub runners. As long as you're comfortable debugging on the cmd-line that's a possible option for accessing HW/SW you don't have physical access to.
Btw, it's possible to ssh into GitHub runners. As long as you're comfortable debugging on the cmd-line that's a possible option for accessing HW/SW you don't have physical access to.
Sadly this is beyond my current capabilities 😢.
Regarding this:
const auto a = static_cast<double>(sharedPt.x - pt1.x);
const auto b = static_cast<double>(pt2.y - sharedPt.y);
const auto c = static_cast<double>(sharedPt.y - pt1.y);
const auto d = static_cast<double>(pt2.x - sharedPt.x);
return a * b == c * d;
While it looks like a great improvement, I was wondering why there's still casting to double
, as opposed to no casting at all (or explicit casts to int64_t
).
I guess so that a * b
or c * d
would not overflow? That makes sense, but instead there may be rounding (when a number exceeds 2^53), so the calculation may not really work:
// an integer not representable by double
const int64_t i = 9007199254740993;
const double a = i;
const double b = i * 100;
const double c = i * 10;
const double d = i * 10;
printf(a * b == c * d ? "equal" : "NOT equal");
This prints NOT equal
, which wouldn't happen if double
could represent integers of this magnitude accurately.
I'm not sure what kinds of values IsCollinear
typically gets. I guess it's possible that casting to double
may lead to a better tradeoff here (than using int64_t
instead), although my gut feeling may suggest otherwise. So if it's a conscious design decision, I'm very much fine with it. But if not, then I might want to propose this instead (because at least currently it seems that the input coordinates are always 64-bit integer):
const auto a = sharedPt.x - pt1.x;
const auto b = pt2.y - sharedPt.y;
const auto c = sharedPt.y - pt1.y;
const auto d = pt2.x - sharedPt.x;
return a * b == c * d;
I guess so that a b or c d would not overflow?
That is indeed the only reason.
That makes sense, but instead there may be rounding (when a number exceeds 2^53), so the calculation may not really work.
Isn't limiting coordinate ranges to ±253 (1016) much better than limiting it to ±std::sqrt(std::numeric_limits
I'm not sure. That's what I was thinking of as well. But in the above example (starting with const int64_t i = 9007199254740993;
), for example, using int64_t
gives the right answer when double
does not. I think integer overflow will never change a true
statement (a * b == c * d
) to be false
(because both sides will just overflow equally), but it might do the opposite (if a * b
overflows and c * d
does not, then they might end up looking equal). Not sure which kind of error (true
→false
or false
→true
) is worse in this specific use case.
Anyway double
is 64-bit just like int64_t
, so it's not like it can represent more numbers faithfully. (In fact I believe it can represent fewer integers accurately, because it can represent also non-integers.)
But anyway I believe that with double
s there can be rounding even when the result of the multiplication exceeds 253, meaning that the reliable input range (of the coordinates themselves) is much lower.
But anyway I believe that with doubles there can be rounding even when the result of the multiplication exceeds 253
Ah, that now makes sense.
Not sure which kind of error (true→false or false→true) is worse in this specific use case.
I suggest that potential errors when testing for collinear with very large coordinate values isn't critical, though of course this should be documented. The only alternative AFAICS is to use int128 (which is what I've done in Clipper1), but this would again significantly affect performance.
Anyhow, you have persuaded me against casting to double, However, I'm not sure how the various compilers will respond when there are overflow errors. In Delphi, I would enclose the function in {$OVERFLOWCHECKS OFF} ... {$OVERFLOWCHECKS ON} directives.
However, I'm not sure how the various compilers will respond when there are overflow errors.
Yeah, that's a good point. I think at minimum, there should be a test case that gets executed as part of the GitHub Actions tests. I could try and add one for C++?
As for C++, I have updated the PR to use unsigned integers, for which the result should be well-defined: the values should just wrap (regardless of compiler settings it seems), which I believe is exactly what we want. I have also added a simple test case (which would fail for the current version that uses double
).
The only alternative AFAICS is to use int128 (which is what I've done in Clipper1), but this would again significantly affect performance.
Guessing it could be a separate code branch that gets invoked only when really needed. But even that would of course affect performance (at least a bit), not to mention making the code more complex.
I'm experiencing something that looks like unexpected collinearity check failures (tested on
main
as well as latest release). When unioning the two triangles below, the result is two triangles, but should be a single triangle as the vertices on the X axis are clearly collinear. I'm also not using the full 64-bit range (just approx 31 bits), so it feels like this should be well within the operating range of Clipper.A failing C++ test case:
Note: This is a minimal failing example. What I'm actually trying to achieve is to project a 3D cone onto one axis and unioning all the individual triangles until I get a single triangular silhouette.