ddbourgin / numpy-ml

Machine learning, in numpy
https://numpy-ml.readthedocs.io/
GNU General Public License v3.0
15.26k stars 3.7k forks source link

Added hard and soft Kmeans clustering with tests #71

Open kenluck2001 opened 3 years ago

kenluck2001 commented 3 years ago

This submission addresses the issue tracked in https://github.com/ddbourgin/numpy-ml/issues/69 We have implemented a soft and hard version of kmeans clustering. The works done can be summarized as follows:

  1. Hard kmeans clustering with fixed assignment of data points to only one cluster at a time.
  2. Soft kmeans clustering with probabilistic assignment of data points. Each data point has a membership degree in each cluster. The highest probable cluster could then be assigned as the cluster index of the data. Alternatively, the probability distribution can be used for any other purpose as it captures our uncertainty of the clustering routine.
ddbourgin commented 3 years ago

Thanks for this, @kenluck2001 ! At first glance this looks great - I'm going to reserve some time to go through this more thoroughly shortly.