gjanesch / data-and-the-world

Blog content.
0 stars 0 forks source link

Examination of the K-Means Broken-Line Method - Data & The World #7

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

Examination of the K-Means Broken-Line Method - Data & The World

Recreating code and expanding upon the analysis for a method of selecting the number of clusters in k-means.

https://data-and-the-world.onrender.com/posts/k-means-broken-line/

robsek commented 3 years ago

Very nice solution to a common problem with k-means. Thanks. The "broken line" approach resembles a technique used in non-metric multi-dimensional scaling. The idea is to fit the similarity data using several different dimensions. Then for each dimension fit you compute the quality of the fit. Finally, when you plot quality of fit vs. number of dimensions, the resulting curve usually has a knee, a discontinuity. That indicates the number of dimensions beyond which one encounters diminishing returns because additional dimensions don't add much to quality of fit. I believe the idea/trick was introduced by Roger Shepard (Stanford).