DeMoriarty / fast_pytorch_kmeans

This is a pytorch implementation of k-means clustering algorithm
MIT License
284 stars 38 forks source link

minibatch k means? #2

Closed mhamilton723 closed 3 years ago

mhamilton723 commented 3 years ago

Is it possible to modify the existing implementation to support minibatch k means? Thanks

DeMoriarty commented 3 years ago

Hi, the current implementation already supports a limited version of minibatch kmeans, however you need to provide the full training data to the fit / fit_predict method, and fit_predict will randomly select a minibatch from the training data at each iteration of kmeans algorithm. In order to enable it, you should set minibatch parameter to the desired minibatch size, as shown here

If you can't provide full training data all at once, then yes it's possible to modify the current version to support that kind of minibatch kmeans as well.

DeMoriarty commented 3 years ago

if you want to run KMeans on GPU, you can take a look at TorchPQ, there are very fast and memory efficient implementations of KMeans and MinibatchKMeans algorithms.

Here is an example of how to use MinibatchKMeans.

mhamilton723 commented 3 years ago

Thank you for this pointer @DeMoriarty !