Open kenluck2001 opened 3 years ago
Thanks for this @kenluck2001! Yes, a K-means clustering model would be a great addition. If you decide to implement both hard and soft variants, I propose you do so within the same KMeans
model object (you can choose which version to use via an arg at initialization: cluster_method={'hard', 'soft'}
).
Also, as a reminder for each PR, please include tests against a standard implementation of the algorithm to help verify correctness :)
I have cleaned the code as well as required tests. The build of the project is hard as it enforces only Python 3.7. My system has lots of dependencies which I don't want to mess up. I will raise PR soon. Here is a snapshot of what to expect in my PR @ddbourgin WORK.zip
There is no clustering apart from the EM for Gaussian mixtures already in the project. Hence, I would like to implement a kmeans algorithm both the hard clustering version which is common and the soft clustering derivation of the kmeans algorithm. Once I get a go-ahead, then I will proceed to raising a PR within the next few days.
The hard version of K-means will follow the implementation in this slide
The soft version of K-means will also follow the implementation in this slide
I have written up both efficient implementations before checking the contribution guide that specifies that there must be an issue opened. Please give your approval and I will raise the PR right away