Closed bahvalo closed 1 year ago
I couldn't find feenableexcept
in MS Visual Studio C++ but I did try feraiseexcept
instead but it wasn't raising overflow exceptions. Nevertheless hopefully this is fixed now.
Angus,
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
into CI tests so FP errors could be caught. It might be #ifdef
-ed as it looks like it is platform-dependant.
sergey, I'd certainly be happy to do that if I could find comparable code for other platforms (eg Windows).
Edit: I did try feraiseexcept
but misunderstood how to use it :(.
Hey, x86/amd64 SSE intrinsics provide a generic way to access the control register without inline assembly or other wrappers.
_MM_SET_EXCEPTION_STATE( _MM_EXCEPT_INVALID | _MM_EXCEPT_DIV_ZERO | _MM_EXCEPT_OVERFLOW );
That's just a macro calling _mm_getcsr() and _mm_setcsr() under the hood, to conveniently preserve the other flags (rounding mode, denormals, etc.).
Remember to #include
Hey, x86/amd64 SSE intrinsics provide a generic way to access the control register without inline assembly or other wrappers.
_MM_SET_EXCEPTION_STATE( _MM_EXCEPT_INVALID | _MM_EXCEPT_DIV_ZERO | _MM_EXCEPT_OVERFLOW );
Not exactly:_MM_SET_EXCEPTION_MASK
, but msvc
has more appropriate functions to do this. Also, it affects only the SSE
's csr
, not FPU
's one.
Have been playing a lot with different settings and found that:
1) gcc
allows the use of FPU
or SSE
instruction set to perform FP
calculations (a mix of both is also available but in experimental state, so is not actual presently). I did not find any specific flag of msvc
that controls instruction set to use for FP
calculations apart from /arch
that probably enables SSE
instructions for FP
as well;
2) the FPU
uses 80-bit precision internally while SSE
uses 64-bit for double
. This feature of FPU
is adjustable, but by default it is set to 80-bit mode. This could lead (and does, see 4) to different results of clipper2 operations depending on instruction set in use, platform, e.t.c. even compiler optimisations may result in a different output, see the next point;
3) even the basic set of optimisations (just -O
with gcc
) eliminates exception with dataset in the OP when calculations are done at FPU
(I am working with commit a2036d2).
4) The ConsoleDemo1 benchmark produces different output results for FPU
and SSE
instructions sets. SSE
is approximately 4% faster than FPU
on my notebook,
Still looking into the issue...
Thank you for the quick response. Your commit fixes FPE for the data above, but I have another data where FPE occurs.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
Path64 a, b;
ADD(a,5873531643786437LL,-10907856004334190LL);
ADD(a,-572063233247808512LL,-10907856357909624LL);
ADD(a,-861031614229043072LL,-295680892812377088LL);
ADD(a,-1149999997847394560LL,-580453927885307520LL);
ADD(a,-861031615255601024LL,-865226963295435776LL);
ADD(a,-572063233981697600LL,-1149999998685980416LL);
ADD(a,5873531569489655LL,-1149999999013396096LL);
ADD(a,583810297809434496LL,-1150000000000004736LL);
ADD(a,872778682431101440LL,-865226965266477056LL);
ADD(a,1149999999999995904LL,-556893216519906624LL);
ADD(a,855158084855340160LL,-260339823793244128LL);
ADD(a,572063232808458240LL,12652856321515572LL);
ADD(b,5873531342419534LL,1128184286593806848LL);
ADD(b,-572063234332976384LL,1128184286913500800LL);
ADD(b,-861031618484528512LL,843411252051976192LL);
ADD(b,-1150000000000004352LL,558638215679646656LL);
ADD(b,-861031617158135040LL,273865179743011296LL);
ADD(b,-572063233247800064LL,-10907856357909624LL);
ADD(b,5873531643786437LL,-10907856004334190LL);
ADD(b,572063232808458240LL,12652856321515572LL);
ADD(b,855158081926256640LL,309206248762144256LL);
ADD(b,1110168977794405120LL,604014641445566208LL);
ADD(b,813032149122702592LL,876134821681722496LL);
ADD(b,543979277405183232LL,1149999999999995136LL);
const int m = 1000;
for(size_t i=0; i<a.size(); i++) { a[i].x /= m; a[i].y /= m; b[i].x /= m, b[i].y /= m; }
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
}
@AngusJohnson, the code from the above message also passes through without exception when compiled with -O -mfpmath=387...
options using commit a2036d2
Point64 GetIntersectPoint(const Active& e1, const Active& e2)
{
if ((std::abs(e1.dx) > 1e-5 && std::abs(e2.dx) > 1e-5) ||
std::abs(q) < 1e-5) return GetEndE1ClosestToEndE2(e1, e2); // almost parallel
...
}
I get exceptions no more, but now the area of the polygons intersection may be essentially inaccurate.
For the following example, old version of the library gives S = 3.234567e+32, which seems to be the correct result. New version gives S = 8.100581e+34.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
Path64 a,b;
ADD(a,1149999999999999872LL,-229296316200567744LL);
ADD(a,1149221101861566976LL,-227336697928099232LL);
ADD(a,968861260885853056LL,226427991054104864LL);
ADD(a,614267143917128320LL,221171937614650112LL);
ADD(a,261697222618302048LL,221187051938477248LL);
ADD(a,88475044460250816LL,-235198106601525184LL);
ADD(a,-88830866724475696LL,-691613528989746176LL);
ADD(a,88475009902961744LL,-1148028933777700480LL);
ADD(a,89240719470182928LL,-1150000000000015488LL);
ADD(b,89005309387580320LL,-234740471586376544LL);
ADD(b,262225952131213568LL,221645974159703744LL);
ADD(b,82876673529597968LL,678044914286996864LL);
ADD(b,-98488636073593504LL,1139574121595159040LL);
ADD(b,-452066572919640704LL,1142151749566806400LL);
ADD(b,-803550496675935104LL,1149999999999984512LL);
ADD(b,-974716117373909248LL,693683731894266112LL);
ADD(b,-1149999999999999872LL,237307541261536192LL);
ADD(b,-971740827905732608LL,-226886938540897120LL);
ADD(b,-795463157809728640LL,-685921522829901184LL);
ADD(b,-442910854088512768LL,-691159864242905472LL);
ADD(b,-88299066280856784LL,-691157211526877184LL);
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
double S = 0.0;
for(size_t j=0; j<solution.size(); j++) S += Area(solution[j]);
printf("S = %e\n", S);
}
I'm getting 3.234567e+32 but I've also updated the code to avoid those time expensive calls to std::fabs().
union eight { int64_t ui64; double d; };
inline bool IsLarge(double val)
{
//https://en.wikipedia.org/wiki/Double-precision_floating-point_format
eight e; e.d = val;
return (e.ui64 & 0x4000000000000000LL && //exponent is positive
((e.ui64 & 0x3FFFFFFFFFFFFFFFLL) >> 56) > 0); // exponent > 5
}
inline bool IsSmall(double val)
{
eight e; e.d = val;
return !(e.ui64 & 0x4000000000000000LL) && //exponent is negative
(e.ui64 & 0x3F00000000000000LL) != 0x3F00000000000000LL; //exponent < -5
}
Point64 GetIntersectPoint(const Active& e1, const Active& e2)
{
double b1, b2, q = (e1.dx - e2.dx);
if ((IsLarge(e1.dx) && IsLarge(e2.dx)) || IsSmall(q))
return GetEndE1ClosestToEndE2(e1, e2); // almost parallel
Edit: I've just figured out why the std::fabs() code above was so slow... because it should have been
if ((std::fabs(e1.dx) > 1e+5 && std::fabs(e2.dx) > 1e+5) || std::abs(q) < 1e-5)
return GetEndE1ClosestToEndE2(e1, e2); // almost parallel
Somehow, my translation from the Delphi code got mucked up. Anyhow, the std::fabs() code is a lot cleaner and marginally faster than my IEEE754 double hack.
Somehow, my translation from the Delphi code got mucked up. Anyhow, the std::fabs() code is a lot cleaner and marginally faster than my IEEE754 double hack.
Sometimes things are not obvious. My first thought when I looked at InsertScanLine was "my God, the insertion and deletion from priority queue is O(log(n))
in both cases, while we need to check for a topmost value only once. Let's do it faster!"
So, I reimplemented the thing with simple vector, binary search on insertion and avoiding duplicates on insertion, then picking up the tail
value with just resizing a vector to (size-1). The timing was the same: actually both approaches has the overall complexity of O(n log(n))
:)
With (std::fabs(e1.dx) > 1e+5 && std::fabs(e2.dx) > 1e+5)
this test successfully passes, thank you.
But I observe another strange behavior.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
Path64 a,b;
ADD(a,-862500000000000128LL,-559553118062537408LL);
ADD(a,-575000000000000256LL,-1119106236125074304LL);
ADD(a,-274LL,-1119106236125074048LL);
ADD(a,574999999999999872LL,-1119106236125068544LL);
ADD(a,862500000000000128LL,-559553118062528896LL);
ADD(a,1150000000000000000LL,5370LL);
ADD(a,862500000000000128LL,559553118062536832LL);
ADD(a,575000000000000256LL,1119106236125074304LL);
ADD(a,274LL,1119106236125074432LL);
ADD(a,-574999999999999872LL,1119106236125068544LL);
ADD(a,-862500000000000128LL,559553118062528320LL);
ADD(a,-1150000000000000000LL,-5961LL);
ADD(b,-836534242197799424LL,-607292355260451712LL);
ADD(b,-524505684941387712LL,-1149999999999999872LL);
ADD(b,49775714785717888LL,-1117707644739548288LL);
ADD(b,624057114512823424LL,-1085415289479091328LL);
ADD(b,886309956983517184LL,-510415289479088832LL);
ADD(b,1148562799454211072LL,64584710520908288LL);
ADD(b,836534242197799424LL,607292355260451200LL);
ADD(b,524505684941387648LL,1149999999999999872LL);
ADD(b,-49775714785717928LL,1117707644739548672LL);
ADD(b,-624057114512823424LL,1085415289479091328LL);
ADD(b,-886309956983517184LL,510415289479088256LL);
ADD(b,-1148562799454211072LL,-64584710520908880LL);
for(int m=0; m<=20; m++) {
for(size_t i=0; i<a.size(); i++) { a[i].x /= 2; a[i].y /= 2; }
for(size_t i=0; i<b.size(); i++) { b[i].x /= 2, b[i].y /= 2; }
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
double S = 0.0;
for(size_t j=0; j<solution.size(); j++) S += Area(solution[j]);
printf("S = %e\n", S*(1<<m)*(1<<m));
}
}
The result should be approximately the same for each m, because I just scale the polygons. However, I get S=9.541817e+35 for m=0,...,5 and S=9.520876e+35 for m=6,...,19. The last result seems to be correct.
Probably, this is another issue. I'm not sure.
I think we're almost there now ...
Point64 GetIntersectPoint(const Active& e1, const Active& e2)
{
double b1, b2, q = (e1.dx - e2.dx);
if (std::fabs(q) < 1e-5)
return GetEndE1ClosestToEndE2(e1, e2); //parallel ?? error
else if (std::fabs(e1.dx) > 1e5)
{
Point64 result;
result.y = (e1.bot.y + e1.top.y) / 2;
b2 = e2.top.y * e2.dx - e2.top.x;
result.x = static_cast<int64_t>(result.y * e2.dx - b2);
return result;
}
else if (std::fabs(e2.dx) > 1e5)
{
Point64 result;
result.y = (e2.bot.y + e2.top.y) / 2;
b1 = e1.top.y * e1.dx - e1.top.x;
result.x = static_cast<int64_t>(result.y * e1.dx - b1);
return result;
}
else if (e1.dx == 0)
{
b2 = e2.bot.y - (e2.bot.x / e2.dx);
return Point64(e1.bot.x,
static_cast<int64_t>(std::round(e1.bot.x / e2.dx + b2)));
}
else if (e2.dx == 0)
{
b1 = e1.bot.y - (e1.bot.x / e1.dx);
return Point64(e2.bot.x,
static_cast<int64_t>(std::round(e2.bot.x / e1.dx + b1)));
}
else
{
b1 = e1.bot.x - e1.bot.y * e1.dx;
b2 = e2.bot.x - e2.bot.y * e2.dx;
q = (b2 - b1) / q;
if (abs(e1.dx) < abs(e2.dx))
{
return Point64(static_cast<int64_t>(e1.dx * q + b1),
static_cast<int64_t>((q)));
}
else
{
return Point64(static_cast<int64_t>(e2.dx * q + b2),
static_cast<int64_t>((q)));
}
}
}
Next.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
Path64 a,b;
ADD(a,862513556575282304LL,862497692244565504LL);
ADD(a,575015864402403840LL,1149991727225358080LL);
ADD(a,2162666575833LL,1149997833368111232LL);
ADD(a,-575000958826813696LL,1149999999999998720LL);
ADD(a,-862511678268535552LL,862510101406525312LL);
ADD(a,-1150000000000000000LL,575005683626717760LL);
ADD(a,-862496399093191040LL,287506358808189728LL);
ADD(a,-574991769015936960LL,1092469489161LL);
ADD(a,668207554827LL,3214670683646LL);
ADD(a,574983678858186304LL,8822033569993LL);
ADD(a,862466772717976960LL,287517953154307328LL);
ADD(a,1149992271103332352LL,575005140416155520LL);
ADD(b,862483595649010176LL,-287494701724283424LL);
ADD(b,574983678858169408LL,8822033569993LL);
ADD(b,668207554827LL,3214670683646LL);
ADD(b,-574991769015928576LL,1092469491591LL);
ADD(b,-862479576162166272LL,-287506296070403456LL);
ADD(b,-1149987266264005248LL,-575000772389782592LL);
ADD(b,-862492577664547712LL,-862499582618224000LL);
ADD(b,-574998567007965248LL,-1149993459671747584LL);
ADD(b,3472721893746LL,-1149995326993346048LL);
ADD(b,575013251381925184LL,-1150000000000001152LL);
ADD(b,862525149920304896LL,-862509393110609280LL);
ADD(b,1150000000000000000LL,-574999583153962688LL);
for(int m=0; m<=20; m++) {
for(size_t i=0; i<a.size(); i++) { a[i].x /= 2; a[i].y /= 2; }
for(size_t i=0; i<b.size(); i++) { b[i].x /= 2, b[i].y /= 2; }
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
double S = Area(solution);
printf("S = %e\n", S*(1<<m)*(1<<m));
}
}
For m=0,...,13, I get S=2.015106e+29; for m=14,...,20 I get S=0. The latter seems correct.
That one is OK.
When there's enough scaling the three almost horizontal vertices that were very slightly overlapping at full scale become identical at smaller scales (so they no longer overlap).
It is very possible that the full-scale polygons do overlap and the small-scale polygons do not. But when we move a single vertex, the area changes continuously. So in this case it should not be bigger than 1e24 (my algorithm gives the value 2.8e21, but it is not reliable). The value 2e29 is definitely in error.
Upd. Without recent patches, Clipper2 gives values from zero to 2.35e21. All these results I consider as correct.
It is very possible that the full-scale polygons do overlap and the small-scale polygons do not. But when we move a single vertex, the area changes continuously.
That's to be expected given the geometry of your polygons. The overlap region is enormously wide but has almost no height so any rounding that alters the height could almost double (or halve) the overlap area. And the effects of rounding (and variations in area measurement) will be most apparent just before your scaling loop returns an empty solution.
Let me explain what I expect from your library.
If an X or Y coordinate of a vertex is changed by 1, the area will change at most by 4.6e18, which is the maximal admissible coordinate value. In my example, we have 24 input vertices and some new vertexes appearing as results of intersection. All these vertexes coordinates are subject to rounding. So it is hardly possible to get the accuracy better than 1e20. I don't expect Clipper to compute the area of intersection with the accuracy 1e20, and the error like 1e23 seems acceptable for me. But with the latest code the error is 10^6 times bigger.
You are right that the effects of rounding are most apparent when the area of intersection is small. For instance, instead of 1e20 we can easily get 1e21 or zero. But this is only if we are looking at the relative values. Looking at the absolute error value of the intersection error, we will say that both 1e21 and zero are OK, but 2e29 is not OK.
If an X or Y coordinate of a vertex is changed by 1, the area will change at most by 4.6e18, which is the maximal admissible coordinate value.
I agree with you.
But with the latest code the error is 10^6 times bigger.
OK, I'll have another look 😁.
I've had another look and can't find a problem.
And here are the areas I get when running your very slightly modified test ...
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
2.01511e+29
0
0
0
0
0
0
0
Perhaps you may have missed something in the numerous code iterations above. Or perhaps I missed documenting something. Anyhow, I've attached the amended code (together with your slightly modified test). Clipper2Lib_Test.zip
Yes, this is the result I get. Do you consider it to be correct?
If I remove the code if (std::fabs(e1.dx) > 1e5) ... else if (std::fabs(e2.dx) > 1e5) ... else
, then I get
1.74797e+20
1.74797e+20
1.74797e+20
1.75947e+20
1.77097e+20
1.79397e+20
1.83997e+20
1.83997e+20
1.83996e+20
2.20795e+20
2.94393e+20
5.88786e+20
1.17757e+21
2.35515e+21
0
0
0
0
0
0
0
which seems perfectly fine to me. I don't know the exact area of intersection, but I guess that all the errors of the area evaluation do not exceed 1e22.
solution (without scaling):
287493241252219040,3009176063409, 287491839429084704,4411016784996, 334103777413,1607335341823
area= 2.01511e+29
(and area calculation verified at https://www.omnicalculator.com/math/irregular-polygon-area )
You are right, my considerations do not prove that the result 2.01e29 is wrong. I'm sure that it is wrong and will try to explain this later.
By the way, I constructed another example that generates FPE.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
const long int M = 10000000;
Path64 a,b;
ADD(a, 0, 0);
ADD(a, M*M, 2);
ADD(a, M*M, M*M);
ADD(a, 0, M*M);
ADD(b, 0, -1);
ADD(b, M*M, M);
ADD(b, M*M, -M*M);
ADD(b, 0, -M*M);
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
printf("S = %e\n", Area(solution));
}
Yeah, i need to test dx== 0 before almost everything else in GetIntersectPoint but it's bed time so more testing and the fix upload won't happen until tomorrow now.
Hopefully this is all fixed now 🤞.
My last case still generates FPE. Or throws an exception if CHECK_OVERFLOW is defined.
And I slightly modified the previous case. Now I scale only X.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
Path64 a,b;
ADD(a,862513556575282304LL,862497692244565504LL);
ADD(a,575015864402403840LL,1149991727225358080LL);
ADD(a,2162666575833LL,1149997833368111232LL);
ADD(a,-575000958826813696LL,1149999999999998720LL);
ADD(a,-862511678268535552LL,862510101406525312LL);
ADD(a,-1150000000000000000LL,575005683626717760LL);
ADD(a,-862496399093191040LL,287506358808189728LL);
ADD(a,-574991769015936960LL,1092469489161LL);
ADD(a,668207554827LL,3214670683646LL);
ADD(a,574983678858186304LL,8822033569993LL);
ADD(a,862466772717976960LL,287517953154307328LL);
ADD(a,1149992271103332352LL,575005140416155520LL);
ADD(b,862483595649010176LL,-287494701724283424LL);
ADD(b,574983678858169408LL,8822033569993LL);
ADD(b,668207554827LL,3214670683646LL);
ADD(b,-574991769015928576LL,1092469491591LL);
ADD(b,-862479576162166272LL,-287506296070403456LL);
ADD(b,-1149987266264005248LL,-575000772389782592LL);
ADD(b,-862492577664547712LL,-862499582618224000LL);
ADD(b,-574998567007965248LL,-1149993459671747584LL);
ADD(b,3472721893746LL,-1149995326993346048LL);
ADD(b,575013251381925184LL,-1150000000000001152LL);
ADD(b,862525149920304896LL,-862509393110609280LL);
ADD(b,1150000000000000000LL,-574999583153962688LL);
for(size_t i=0; i<a.size(); i++) { a[i].x /= 2; a[i].y /= 2; }
for(size_t i=0; i<b.size(); i++) { b[i].x /= 2, b[i].y /= 2; }
for(int m=0; m<=1; m++) {
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
double S = Area(solution);
printf("S = %e\n", double(S)*(1<<m));
for(size_t i=0; i<a.size(); i++) a[i].x /= 2;
for(size_t i=0; i<b.size(); i++) b[i].x /= 2;
}
}
Now it returns
S = 2.015106e+29
S = 0.000000e+00
This indicates that one of these results are in error.
Or throws an exception if CHECK_OVERFLOW is defined.
Strange. I wasn't getting overflow errors before but now I am? Work ... more work 😉.
Edit: I only started getting errors when I changed M's type to int64_t :
const int64_t M = 10000000;
Evidently MSVC's long int
is only 4 bytes.
Ok. Next. The following code generates an FPE.
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
Path64 a,b;
ADD(a,5873531643786437LL,-10907856004334190LL);
ADD(a,-572063233247808512LL,-10907856357909624LL);
ADD(a,-861031614229043072LL,-295680892812377088LL);
ADD(a,-1149999997847394560LL,-580453927885307520LL);
ADD(a,-861031615255601024LL,-865226963295435776LL);
ADD(a,-572063233981697600LL,-1149999998685980416LL);
ADD(a,5873531569489655LL,-1149999999013396096LL);
ADD(a,583810297809434496LL,-1150000000000004736LL);
ADD(a,872778682431101440LL,-865226965266477056LL);
ADD(a,1149999999999995904LL,-556893216519906624LL);
ADD(a,855158084855340160LL,-260339823793244128LL);
ADD(a,572063232808458240LL,12652856321515572LL);
ADD(b,5873531342419534LL,1128184286593806848LL);
ADD(b,-572063234332976384LL,1128184286913500800LL);
ADD(b,-861031618484528512LL,843411252051976192LL);
ADD(b,-1150000000000004352LL,558638215679646656LL);
ADD(b,-861031617158135040LL,273865179743011296LL);
ADD(b,-572063233247800064LL,-10907856357909624LL);
ADD(b,5873531643786437LL,-10907856004334190LL);
ADD(b,572063232808458240LL,12652856321515572LL);
ADD(b,855158081926256640LL,309206248762144256LL);
ADD(b,1110168977794405120LL,604014641445566208LL);
ADD(b,813032149122702592LL,876134821681722496LL);
ADD(b,543979277405183232LL,1149999999999995136LL);
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
}
OK, bed time now, so tomorrow. Thanks for your patience and very helpful feedback.
#define CHECK_OVERFLOW //define only when debugging
#ifdef CHECK_OVERFLOW
static const char* overflow_error = "overflow error.";
inline void CheckAdd(double val1, double val2)
{
if (val1 + val2 > LLONG_MAX) throw overflow_error;
if (val1 + val2 < LLONG_MIN) throw overflow_error;
}
inline void CheckAdd(int64_t val1, double val2)
{
CheckAdd(static_cast<double>(val1), val2);
}
inline void CheckMul(double val1, double val2)
{
if (val1 == 0 || val2 == 0) return;
const double v1 = std::fabs(val1);
const double v2 = std::fabs(val2);
if (v1 > LLONG_MAX / v2) throw overflow_error;
if (v1 < LLONG_MIN / v2) throw overflow_error;
}
inline void CheckMul(int64_t val1, double val2)
{
CheckMul(static_cast<double>(val1), val2);
}
#endif
bool GetIntersectPoint(const Active& e1, const Active& e2, Point64& ip)
{
// precondition: neither edge is horizontal
//assert(!IsHorizontal(e1) && !IsHorizontal(e2));
double abs_dx1 = std::fabs(e1.dx);
double abs_dx2 = std::fabs(e2.dx);
if (abs_dx1 < 1e-5)
{
if (abs_dx2 < 1e-5) return false; // parallel
double b2 = e2.bot.y * e2.dx - e2.bot.x;
ip.x = e1.bot.x;
#ifdef CHECK_OVERFLOW
CheckMul(e1.bot.x + b2, 1 / e2.dx);
#endif
ip.y = static_cast<int64_t>(std::round((e1.bot.x + b2) / e2.dx));
return true;
}
else if (abs_dx2 < 1e-5)
{
double b1 = e1.bot.y * e1.dx - e1.bot.x;
ip.x = e2.bot.x;
#ifdef CHECK_OVERFLOW
CheckMul(e2.bot.x + b1, 1/e1.dx);
#endif
ip.y = static_cast<int64_t>(std::round((e2.bot.x + b1) / e1.dx));
return true;
}
else if (abs_dx1 > 1e12)
{
ip.y = (e1.bot.y + e1.top.y) / 2;
double b2 = e2.top.y * e2.dx - e2.top.x;
#ifdef CHECK_OVERFLOW
CheckAdd(ip.y * e2.dx, -b2);
#endif
ip.x = static_cast<int64_t>(ip.y * e2.dx - b2);
return true;
}
else if (abs_dx2 > 1e12)
{
ip.y = (e2.bot.y + e2.top.y) / 2;
double b1 = e1.top.y * e1.dx - e1.top.x;
#ifdef CHECK_OVERFLOW
CheckAdd(ip.y * e1.dx, -b1);
#endif
ip.x = static_cast<int64_t>(ip.y * e1.dx - b1);
return true;
}
double q = (e1.dx - e2.dx);
double abs_q = std::fabs(q);
if (abs_q < 1e-5) return false; //parallel
double b1 = e1.bot.x - e1.bot.y * e1.dx;
double b2 = e2.bot.x - e2.bot.y * e2.dx;
if (abs_q < std::min(abs_dx1, abs_dx2))
{
// edges are closer to horizontal so
// greater accuracy to calc ip.x first
#ifdef CHECK_OVERFLOW
CheckMul(b2 * e1.dx - b1 * e2.dx, 1/q);
#endif
double x = (b2 * e1.dx - b1 * e2.dx) / q;
ip.x = static_cast<int64_t>(x);
#ifdef CHECK_OVERFLOW
CheckMul(x - b1, 1 / e1.dx);
#endif
ip.y = static_cast<int64_t>((x - b1) / e1.dx);
}
else
{
// edges are closer to vertical so
// greater accuracy to calc ip.y first
#ifdef CHECK_OVERFLOW
CheckMul(b2 - b1, 1/q);
#endif
double y = (b2 - b1) / q;
ip.y = static_cast<int64_t>(y);
if (abs(e1.dx) < abs(e2.dx))
{
#ifdef CHECK_OVERFLOW
CheckAdd(e1.dx * y, b1);
#endif
ip.x = static_cast<int64_t>(e1.dx * y + b1);
}
else
{
#ifdef CHECK_OVERFLOW
CheckAdd(e2.dx * y, b2);
#endif
ip.x = static_cast<int64_t>(e2.dx * y + b2);
}
}
return true;
}
Note: this function can probably be further optimised and tidied (apart from removing all the overflow checks), but I'm hopeful it now covers all contingencies.
Alternatively, I think you could check the double value just prior to the conversion to int64_t (which is when the SIGFPE is raised).
if( fabs( my_double ) >= (double)INT64_MAX ) then_we_have_no_intersection;
Or perhaps just clamp with my_double = fmax( fmin( my_double, (double)INT64_MAX ), (double)INT64_MIN );
prior to int64_t conversion (fmax/fmin are converted to a single instruction, it's branch-free).
Alternatively, I think you could check the double value just prior to the conversion to int64_t (
I'm hoping that won't be necessary (except while debugging).
#define CHECK_OVERFLOW //define only when debugging
#ifdef CHECK_OVERFLOW
static const char* overflow_error = "overflow error.";
inline int64_t CheckCastInt64(double val)
{
if (val > LLONG_MAX) throw overflow_error;
else if (val < LLONG_MIN) throw overflow_error;
else return static_cast<int64_t>(val);
}
#else
inline int64_t CheckCastInt64(double val)
{
return static_cast<int64_t>(val);
}
#endif
bool GetIntersectPoint(const Active& e1, const Active& e2, Point64& ip)
{
// precondition: neither edge is horizontal
//assert(!IsHorizontal(e1) && !IsHorizontal(e2));
double abs_dx1 = std::fabs(e1.dx);
double abs_dx2 = std::fabs(e2.dx);
if (abs_dx1 < 1e-5)
{
if (abs_dx2 < 1e-5) return false; // parallel edges
double b2 = e2.bot.y * e2.dx - e2.bot.x;
ip.x = e1.curr_x;
ip.y = CheckCastInt64(std::round((e1.curr_x + b2) / e2.dx));
return true;
}
else if (abs_dx2 < 1e-5)
{
double b1 = e1.bot.y * e1.dx - e1.bot.x;
ip.x = e2.curr_x;
ip.y = CheckCastInt64(std::round((e2.curr_x + b1) / e1.dx));
return true;
}
double q = (e1.dx - e2.dx);
if (std::fabs(q) < 1e-5) return false; //parallel
double b1 = e1.bot.x - e1.bot.y * e1.dx;
double b2 = e2.bot.x - e2.bot.y * e2.dx;
if (std::min(abs_dx1, abs_dx2) > 1)
{
// both edges are closer to horizontal so
// it's better to calc ip.x first
double x = (b2 * e1.dx - b1 * e2.dx) / q;
ip.x = CheckCastInt64(x);
if (abs(e1.dx) > abs(e2.dx))
ip.y = CheckCastInt64((x - b1) / e1.dx);
else
ip.y = CheckCastInt64((x - b2) / e2.dx);
}
else
{
double y = (b2 - b1) / q;
ip.y = CheckCastInt64(y);
if (abs(e1.dx) < abs(e2.dx)) //one or other dx <= 1
ip.x = CheckCastInt64(e1.dx * y + b1);
else
ip.x = CheckCastInt64(e2.dx * y + b2);
}
return true;
}
Time for _UI128_MAX processors ;-)
Time for _UI128_MAX processors ;-)
Somewhat surprisingly, even with overflow checking enabled, the performance cost of this extra code is negligible.
Now this throws an exception.
#include <stdio.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
Path64 a,b;
ADD(a,809023171470172800LL,-874758197839316864LL);
ADD(a,1114348780979194752LL,-579903279137305344LL);
ADD(a,862499999997369344LL,-275241802164393664LL);
ADD(a,574999999999326528LL,9806558268645062LL);
ADD(a,366186LL,9806558269589536LL);
ADD(a,-574999999999010880LL,9806558269994310LL);
ADD(a,-862500000000128384LL,-275241802162350528LL);
ADD(a,-1150000000000000000LL,-560290162594830272LL);
ADD(a,-862500000001121664LL,-845338523027656960LL);
ADD(a,-574999999999318144LL,-1130386883458883840LL);
ADD(a,896526LL,-1130386883458421248LL);
ADD(a,539348780981543424LL,-1149999999999990400LL);
ADD(b,862500000000239872LL,294854918700700736LL);
ADD(b,1150000000000000000LL,579903279134337024LL);
ADD(b,862500000001376384LL,864951639568146688LL);
ADD(b,575000000000395648LL,1150000000000009728LL);
ADD(b,833390LL,1149999999999219456LL);
ADD(b,-574999999998817280LL,1149999999998891776LL);
ADD(b,-862499999997445120LL,864951639566450560LL);
ADD(b,-1149999999997548288LL,579903279134568320LL);
ADD(b,-862499999997262080LL,294854918702743936LL);
ADD(b,-574999999999010880LL,9806558269994310LL);
ADD(b,366186LL,9806558269589536LL);
ADD(b,574999999999322368LL,9806558268645062LL);
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
double S = Area(solution);
printf("S = %e\n", double(S));
}
Somewhat surprisingly, even with overflow checking enabled, the performance cost of this extra code is negligible. That would be the disadvantage, because the 128-bit arithmetic is emulated on a 64-bit processors. And in Delphi there is no 128 bit type (yet).
bool GetIntersectPoint(const Active& e1, const Active& e2, Point64& ip)
{
// precondition: neither edge is horizontal
//assert(!IsHorizontal(e1) && !IsHorizontal(e2));
static const double parallel_tolerance = 1.0e-5;
static const double vertical_tolerance = 1.0e-5;
double abs_dx1 = std::fabs(e1.dx);
double abs_dx2 = std::fabs(e2.dx);
if (abs_dx1 < vertical_tolerance)
{
if (abs_dx2 < parallel_tolerance) return false;
double b2 = e2.bot.y * e2.dx - e2.bot.x;
ip.x = e1.curr_x;
ip.y = CheckCastInt64(std::round((e1.curr_x + b2) / e2.dx));
return true;
}
else if (abs_dx2 < vertical_tolerance)
{
double b1 = e1.bot.y * e1.dx - e1.bot.x;
ip.x = e2.curr_x;
ip.y = CheckCastInt64(std::round((e2.curr_x + b1) / e1.dx));
return true;
}
double q = (e1.dx - e2.dx);
double abs_q = std::fabs(q);
double b1 = e1.bot.x - e1.bot.y * e1.dx;
double b2 = e2.bot.x - e2.bot.y * e2.dx;
double min_dx = std::min(abs_dx1, abs_dx2);
if (min_dx > 1) // better to calc ip.x before ip.y
{
if (abs_q < parallel_tolerance * min_dx) return false;
double x = (b2 * e1.dx - b1 * e2.dx) / q;
ip.x = CheckCastInt64(x);
if (abs_dx1 > abs_dx2)
ip.y = CheckCastInt64((x - b1) / e1.dx);
else
ip.y = CheckCastInt64((x - b2) / e2.dx);
}
else
{
if (abs_q < parallel_tolerance) return false;
double y = (b2 - b1) / q;
ip.y = CheckCastInt64(y);
if (abs(e1.dx) < abs(e2.dx)) //one or other dx <= 1
ip.x = CheckCastInt64(e1.dx * y + b1);
else
ip.x = CheckCastInt64(e2.dx * y + b2);
}
return true;
}
I can't compile the code, CheckCastInt64 undefined
#define CHECK_OVERFLOW //define only when debugging
inline int64_t CheckCastInt64(double val)
{
#ifdef CHECK_OVERFLOW
if (val > LLONG_MAX || val < LLONG_MIN) throw "overflow error.";
#endif
return static_cast<int64_t>(val);
}
Ok. Now the inverse situation with almost overlapping polygons.
#include <stdio.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;
#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))
int main(int, char**) {
Path64 a,b;
ADD(a,890236108478911616LL,-600678594654510336LL);
ADD(a,1149040936202350208LL,-665484099150913LL);
ADD(a,841249898247169152LL,586996692971884928LL);
ADD(a,566116333779815360LL,1149957003074273664LL);
ADD(a,15849204845108742LL,1125255136105704320LL);
ADD(a,-567075397577446400LL,1125255136105665408LL);
ADD(a,-858537698788722944LL,549943892518953088LL);
ADD(a,-1150000000000000512LL,-25367351067798084LL);
ADD(a,-858537698788722944LL,-600678594654510336LL);
ADD(a,-646239222076756480LL,-1149999999999980544LL);
ADD(a,-102896531903854784LL,-1137005080879301120LL);
ADD(a,519609982768333504LL,-1149999999999980544LL);
ADD(b,891197804795068672LL,-600633750840939648LL);
ADD(b,1149999999999999360LL,-619165597971545LL);
ADD(b,842206383722243840LL,587041257642480640LL);
ADD(b,567070349309609024LL,1150000000000019456LL);
ADD(b,16803328759483116LL,1125294997556221952LL);
ADD(b,-566121273655786048LL,1125291675995182336LL);
ADD(b,-857581050725848192LL,549978771635127808LL);
ADD(b,-1149040827795910528LL,-25334132724926740LL);
ADD(b,-857576002450706816LL,-600643715523980544LL);
ADD(b,-645275115632407808LL,-1149963911165188608LL);
ADD(b,-101932482480590576LL,-1136965896025245440LL);
ADD(b,520574089198109120LL,-1149957268043187200LL);
for(int m=0; m<=20; m++) {
Paths64 AA; AA.push_back(a);
Paths64 BB; BB.push_back(b);
Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
const double M = pow(4.,m);
printf("S = %e, SA-S = %e, SB-S = %e\n", Area(solution)*M, (Area(a)-Area(solution))*M, (Area(b)-Area(solution))*M);
for(size_t i=0; i<a.size(); i++) a[i].x /= 2;
for(size_t i=0; i<b.size(); i++) b[i].x /= 2;
for(size_t i=0; i<a.size(); i++) a[i].y /= 2;
for(size_t i=0; i<b.size(); i++) b[i].y /= 2;
}
}
I get S = 3.942184e36 for some m and S = 3.941915e36 for other m. The latter result is correct.
That one looks OK to me. (The relatively small variations I would attribute to rounding.)
You may replace Intersect by Difference. Then you'll get 1.95e33 and 2.22e33 (10% error).
Yes, but given that these non-overlapping regions are extremely long and thin, that's not unexpected.
Did I understand correctly that Clipper is not intended to obtain an accurate value of the intersection area (say, with the accuracy 1e24)?
For my purpose, it will be better to get more accurate results compromising the speed (say, using 128-bit arithmetic model). But I understand that there may be different applications with different criteria.
In all cases, thank you for your library. Even if you consider the accuracy of 2e33 to be tolerable, your library is helpful for me.
Note that computing the Area() with the current code is extremely unreliable, as you may be accumulating a ton of values with wildly different magnitudes, and signs. Though that error should be small in the scenarios above just because your lists of points are short.
If you want an accurate Area(), you are going to need a different approach. The same algorithm could be kept by switching to Shewchuk summation, for example.
Little follow-up, I quickly passed your input through my version which does Shewchuk summation and the Area() difference is negligible. The whole difference comes from the list of solution vertices (I guess we already knew that, it's now confirmed).
In case it helps, I traced the exact origin of the discrepancy (I get ridiculously verbose debugging by turning on a #define).
When the result is "correct" (area S = 3.94191491330415002259e+36), then there's an intersection point
-1105741767893338 2197763937706377
that flies unchanged through AddNewIntersectNode(), because top_y
is also 2197763937706377, matching pt.y
When the result is "bad" (area S = 3.94218403115878237993e+36), then there's an intersection point
-2211483535786677 4395527875412754
that's being modified in AddNewIntersectNode(), more precisely because top_y
is 4395527875412755 there (entering else if (pt.y < top_y)
), and execution goes through the else if (e2.top.y == top_y)
branch, which makes that intersection point snap to the vertex #5
of input contour AA.
Adding some missing nearbyint()/round() calls in GetIntersectPoint() helps a little bit; there are more "correct" areas in the list of 21 tests, though that's not the complete answer.
Executing the following code results in a floating point exception in function Point64 GetIntersectPoint(const Active& e1, const Active& e2). Namely, when a double value is cast to int64_t, it appears to be outside the range of int64_t.
The coordinates of the polygon vertices are within the range -4.6e18 ... 4.6e18. According to the documentation, that is a valid input.
Division by ten is not essential.