BrunoLevy / geogram

a programming library with geometric algorithms
Other
1.87k stars 126 forks source link

CSG operations sometimes crash under MacOS #133

Closed BrunoLevy closed 6 months ago

BrunoLevy commented 9 months ago

CSG operations sometimes crash under MacOS, not always.

This probably comes from the smaller stack size under MacOS, and some operations involving arithmetic expansions allocated on the stack (I guess that what we have is a stack overflow). Remember: stack is much smaller under MacOS than with other OSes.

This mostly happens in one of the CSG tests, which lets me think that one of the radial sort predicates still allocates expansions on the stack. Once this happened in one of the Intersect tests (three_cubes).

Note: this issue is mostly a duplicate of #112

Edit Feb 2024: In fact there were two different problems

BrunoLevy commented 9 months ago

Ooops, did not want to close the issue, needs to be tested before. expansion::assign_product() allocates subproducts, on stack if number of components are smaller than MAX_CAPACITY_ON_STACK, that was set to 1024 -> decreased it to 512.

BrunoLevy commented 9 months ago

Seems to fix the problem (at least for last run). Since the problem is not reproducible, let us wait for a couple of days before closing the issue...

BrunoLevy commented 9 months ago

Launched the tests three times, and still got a crash... Will need finer analysis of the crash.

BrunoLevy commented 9 months ago

Lowered max expansion size to 256

OK, it was in expansion::compare() that computes an expansion difference and always stores it on the stack -> now stores the temporary expansion on the heap if the required capacity is larger than MAX_EXPANSION_ON_HEAP

BrunoLevy commented 9 months ago

CSG seems to be OK on mac now, but we got another test that crashed on mac in debug mode, compute_RVD with tordu.mesh (still assertion failure capa <= MAX_CAPACITY_ON_STACK). I guess it is in one of the side1(), side2(), side3() predicates. It can be probably safely ignored, but let us investigate...

But it is weird we never had this one before ? -> it is because MAX_CAPACITY_ON_STACK was 512 rather than 256 !

OK still using my Linux box with all defines as if we were running with a small stack, it is in side3_exact_SOS() (not a big surprise), and we compute the product between an expansion of length 8 and one of length 19, which requires 2819=304 components. So 256 is not enough for all the predicates with doubles as input. Let us see now what happens with MAX_CAPACITY_ON_STACK set back to 512 (but I'm afraid we will still have some stack overflows coming back, let us see...).

BrunoLevy commented 9 months ago

Nope, 512 is too much.

Well in fact we need to make a difference between:

BrunoLevy commented 9 months ago

Now testing...

BrunoLevy commented 9 months ago

All light greens.... Let us test several times before closing the issue.

BrunoLevy commented 9 months ago

Ran the test three times, seems to be OK, Let us wait for nightly builds before closing the issue...

BrunoLevy commented 9 months ago

All lights green !

BrunoLevy commented 8 months ago

Got a crash on mac with compute_CSG example_018.csg, Debug, Gargantua, this build

This one will be difficult to catch because it is not reproducible (re-launching the job did not re-trigger the bug)

BrunoLevy commented 6 months ago

Added a stacktrace in the signal handler Problem seems to occur in simplify_coplanar_facets(), in the phase that runs in parallel, that uses a CoplanarFacets object, that creates some attributes. Maybe a race condition in the attributes manager ? But one of the stack traces had a call to expansion::assign_product()...

BrunoLevy commented 6 months ago

It seems that AttributeStore::notify() mises a spinlock acquire/release.

BrunoLevy commented 6 months ago

Seems to be fixed by acquiring/releasing the spinlock in AttributeStore::notify(). Before closing the issue, I need to make sure there are no wrong concurrent accesses to the mesh in simplify_coplanar_facets, in particular when new facets are created. The number of new facets is bounded (re-triangulating a polygon does not create more facets than the initial number of facets), so maybe I should just reserve twice the number of existing facets.

BrunoLevy commented 6 months ago

Added MeshFacets::reserve() function called at the beginning of simplify_coplanar_facets()

BrunoLevy commented 6 months ago

Seems to be fixed (summary):

If the problem reappears: