dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.03k stars 1.88k forks source link

K-mode #5957

Open ma1f opened 3 years ago

ma1f commented 3 years ago

I would like to auto cluster high dimensional binary (true/false) data, believe k-mode would be more appropriate then k-means for this scenario.

Are there plans to support further clustering algorithms, including k-mode?

michaelgsharp commented 3 years ago

@briacht for visibility.

justinormont commented 3 years ago

In practice, I think you'll get a similar result by running k-means.

Expanding beyond your boolean data to categorical data, there can be some speed and memory savings of using k-modes vs. k-means + one-hot encoding on categorical data. I haven't tried, though there's likely advantages in having the additional distance (dissimilarity) functions.

For future clustering algos in ML․NET, there is existing ML․NET/TLC code (1, 2) for OPTICS and DBSCAN, which could be brought into this repo.

See also: