Open utterances-bot opened 3 years ago
Very nice solution to a common problem with k-means. Thanks. The "broken line" approach resembles a technique used in non-metric multi-dimensional scaling. The idea is to fit the similarity data using several different dimensions. Then for each dimension fit you compute the quality of the fit. Finally, when you plot quality of fit vs. number of dimensions, the resulting curve usually has a knee, a discontinuity. That indicates the number of dimensions beyond which one encounters diminishing returns because additional dimensions don't add much to quality of fit. I believe the idea/trick was introduced by Roger Shepard (Stanford).
Examination of the K-Means Broken-Line Method - Data & The World
Recreating code and expanding upon the analysis for a method of selecting the number of clusters in k-means.
https://data-and-the-world.onrender.com/posts/k-means-broken-line/