PyDataBlog / ParallelKMeans.jl

Parallel & lightning fast implementation of available classic and contemporary variants of the KMeans clustering algorithm
MIT License
50 stars 13 forks source link

Recursive version of multithreading #23

Open Arkoniak opened 4 years ago

Arkoniak commented 4 years ago

Our current implementation is rather simplistic and naive, just split up matrix in equal chunks and upload them to different threads. But this is not the way how it was intended to be: https://julialang.org/blog/2019/07/multithreading/

The general idea, how it should be implemented is to write a recursive function, which splits matrix in half and recursive call itself. Upon hitting some limit (<1000 columns for example?) actual procedure should commence.

This approach has its benefits, for example, there would be no penalty for multithreading small matrices since the algorithm wouldn't start new threads in this case. Also, it helps to remove MultiThreading/SingleThread modes completely. We should implement this approach and benchmark it properly.