Closed cstich closed 4 years ago
The main reason, why we have this limitation for YinYang
is that in current state it will produce wrong results for any non-euclidean metric. Main internal functions, such as chunk_update_centroids
or point_all_centers!
have Square Euclidean
logic embedded in them, for example, https://github.com/PyDataBlog/ParallelKMeans.jl/blob/master/src/yinyang.jl#L268
metric
argument was added purely for compatibility with other algorithms, but proper removing of Eucledian
restriction requires sufficient refactoring of the algorithm as well.
Thanks for bringing attention, I opened #92
That makes sense. Thanks for the explanation.
Elkan, Lloyd, and Hamerly all have duck typing for their metric argument, whereas Yin Yang only accepts
Euclidean
as a distance metric. This tiny pull requests brings the API for Yin Yang in line with Elkan and the others.I am aware that strictly speaking the convergence of KMeans is only guaranteed for the Euclidean distance, but if there is a reason for only allowing
Euclidean
for Yin Yang and not the others, that is not clear to me.