Open ma1f opened 3 years ago
@briacht for visibility.
In practice, I think you'll get a similar result by running k-means.
Expanding beyond your boolean data to categorical data, there can be some speed and memory savings of using k-modes vs. k-means + one-hot encoding on categorical data. I haven't tried, though there's likely advantages in having the additional distance (dissimilarity) functions.
For future clustering algos in ML․NET, there is existing ML․NET/TLC code (1, 2) for OPTICS and DBSCAN, which could be brought into this repo.
See also:
I would like to auto cluster high dimensional binary (true/false) data, believe k-mode would be more appropriate then k-means for this scenario.
Are there plans to support further clustering algorithms, including k-mode?