ddbourgin / numpy-ml

Machine learning, in numpy
https://numpy-ml.readthedocs.io/
GNU General Public License v3.0
15.55k stars 3.73k forks source link

Feature Request: Clustering Kmeans (hard and soft version) #69

Open kenluck2001 opened 3 years ago

kenluck2001 commented 3 years ago

There is no clustering apart from the EM for Gaussian mixtures already in the project. Hence, I would like to implement a kmeans algorithm both the hard clustering version which is common and the soft clustering derivation of the kmeans algorithm. Once I get a go-ahead, then I will proceed to raising a PR within the next few days.

The hard version of K-means will follow the implementation in this slide image

The soft version of K-means will also follow the implementation in this slide image

I have written up both efficient implementations before checking the contribution guide that specifies that there must be an issue opened. Please give your approval and I will raise the PR right away

ddbourgin commented 3 years ago

Thanks for this @kenluck2001! Yes, a K-means clustering model would be a great addition. If you decide to implement both hard and soft variants, I propose you do so within the same KMeans model object (you can choose which version to use via an arg at initialization: cluster_method={'hard', 'soft'}).

Also, as a reminder for each PR, please include tests against a standard implementation of the algorithm to help verify correctness :)

kenluck2001 commented 3 years ago

I have cleaned the code as well as required tests. The build of the project is hard as it enforces only Python 3.7. My system has lots of dependencies which I don't want to mess up. I will raise PR soon. Here is a snapshot of what to expect in my PR @ddbourgin WORK.zip