elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
785 stars 323 forks source link

Fix StackOverflow ex in AlphaShape #101

Closed SIDSSIDS closed 2 years ago

SIDSSIDS commented 2 years ago

Replaced usage of recursion with a stack in the alpha shape computation. Recursion led to StackOverflow exception on big datasets (4646 points in my case)

kno10 commented 2 years ago

Thank you, I will have a look. 4k does not appear large, but it seems I underestimate how nasty these datasets can be. Using a stack seems fine to me. For efficiency, I'd avoid the nested data structure, but use an IntegerArray, and push/poll 4 values at a time maybe. This should reduce memory allocations substantially and improve performance.

kno10 commented 2 years ago

Do you have an interesting test case for benchmarking?

SIDSSIDS commented 2 years ago

I'd avoid the nested data structure, but use an IntegerArray

By nested data structure you mean constructions like new int[]{i, tri.ca, tri.c, tri.a}? I'm not sure, that this kind of array initialization is slower than the following:

int[] a = new int[4];
a[0] = i;
a[1] = tri.ca;
a[2] = tri.c;
a[3] = tri.a;

If so, can you give me a hint why?

Do you have an interesting test case for benchmarking?

I have only the test case, which has led to the problem. alpha_points.csv You can reproduce the error with the default value of alpha

java -Xss512k -jar elki-bundle-0.7.6-SNAPSHOT.jar KDDCLIApplication -dbc.in alpha_points.csv -algorithm algorithm.NullAlgorithm -resulthandler visualizer

and pick DoubleVector -> Scatterplot -> Alpha Shape

kno10 commented 2 years ago

Using a single IntegerArray array as stack (a wrapped int[]), and the four values stored at offsets (i<<2) + j. See 6465675 for the resulting code. This is based on your contribution, although I used a different data structure that uses less memory. Its not pretty code, because the four array adds are unrolled. I could have used an "add4" method, which likely would be inlined and hence not make much of a difference. Thank you for the test case for reproducing the error (as this allowed me to check it is resolved).