elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
780 stars 321 forks source link

Infinite Loop in Kmediods #3

Closed Machiry closed 9 years ago

Machiry commented 9 years ago

New push to Kmedoids results in infinite loop when k=1.

When k=1, the continue statement at: https://github.com/elki-project/elki/blob/master/elki/src/main/java/de/lmu/ifi/dbs/elki/algorithm/clustering/kmeans/KMedoidsPAM.java#L200 will always be hit, thus best ending up Positive_infinity and never breaking.

kno10 commented 9 years ago

No. that line is only always hit if every point is its own medoid. I haven't tested these corner cases like k=1 or k=n. Also, if best=+infinity, it will be > 0 and thus the loop will stop. But I have little interest in PAM except for historical reasons.

kno10 commented 9 years ago

As a matter of fact, running PAM with k=1 terminates for me after 1 iteration for me. But I will look into why it sometimes does seem to take forever.

kno10 commented 9 years ago

The issue is much more subtle than that. I tried to literally implement the equations of Kaufman & Rousseeuw. As far as I can tell, there is an omission in the description that leads to these problems. Specifically, they only use "nonselected objects" for computing the cost. However, the element that will be deselected also needs to be reassigned to a new cluster. That is why my literal implementation ended up swapping medoids infinitely. On larger data sets, PAM will usually take an excessive amount of time in the swap phase. I suggest to always run it with a limit on the number of iterations.

kno10 commented 9 years ago

It took me a long time to really figure out how to do this right, and at the same time avoid extra distance computations. It seems to be okay now.