Open bdelespierre opened 3 years ago
Implementing the elbow method is quite expensive to implement. getTotalVariance() is correct for implementing the elbow method. But implementing the elbow method requires more implementations. As you can see, kmeans has different results depending on the initial centroid position. This means that the elbow position can be different for each run. We also need a policy for averaging that elbow.
From this ticket's scope, calculating the Elbow point is someone else's problem. We're just providing the variance here :wink:
Oh, that's right. Then I understood. great.
In order to find the best value for K (the number of clusters), it would be nice to get the variance of the distance of clustered points to their cluster's centroid.
Inspired by https://www.youtube.com/watch?v=4b5d3muPQmA Also see https://en.wikipedia.org/wiki/Elbow_method_(clustering)
I also believe the current v3 implementation of RandomInitialization is wrong :man_shrugging:
Proposed change