annoviko / pyclustering

pyclustering is a Python, C++ data mining library.
https://pyclustering.github.io/
BSD 3-Clause "New" or "Revised" License
1.17k stars 248 forks source link

[pyclustering.cluster] Medoid initializer #421

Closed annoviko closed 5 years ago

annoviko commented 6 years ago

Introduction Medoid initializer is required for K-Medoids algorithm:

Description New classes should be added:

annoviko commented 5 years ago

Additional parameter is introduced to kmeans_plus_plus instead of new class:

initial_medoids = kmeans_plusplus_initializer(sample, 4).initialize(return_index=True)
kmedoids_instance = kmedoids(sample, initial_medoids)
zahs123 commented 5 years ago

how do i add kmeans_plus_plus?

edmon66 commented 4 years ago

Dear all,

I came quickly on this issue, and other related to this one, and I have to say that the answers are still yet partial. Indeed, you cannot provide a precomputed distance matrix to the kmeans_plusplus_initializer(). Although a kmedoids()accepts data_type='distance_matrix' argument, there is nothing similar to deal with a distance_matrix in kmeans_plusplus_initializer() The examples here and in other threads don't reply to the initial issue : what can I do if i CANNOT compute the distances on the fly (<-> on-the-fly is the hard coded euclidean distance inside __calculate_shortest_distances(self, data, centers) line 203 of center_initializer.py)

I hope readers and authors will read carefully my concern, and won't be confused with a list of vectors and a distance_matrix (precomputed over a list of vectors). I'm probably not the first one who faced this tiny missing piece of code having huge consequences in the use of kmeans_plusplus_initializer with kmedoids

Thank you

annoviko commented 4 years ago

Hi @edmon66 ,

Thank you for your message, but frankly speaking I haven't seen any requests to support distance_matrix for K-Means++ initializer (here on GitHub). If there is no some feature and you think that it would be nice to have it, please, create a new issue with a proposal.

I have created an issue regarding to your request: #622

edmon66 commented 4 years ago

Thank you for your consideration!

In fact, I took my courage to write cause I wasn't alone :-) If you remember, the first user wrote not only to use the combination medoid & km++ but also about the Gower distance. He stopped to write/reply because he faced the issue to not understand the results. The technical solution is working (I mean it does not crash) but it was computing distances on a matrix of distances... He had an answer whose shape was correct, but not the content :-) Maybe other users fell into the trap... Maybe I just rephrase what was not not crystal clear in the initial discussion

In any case, I love this library (compared to some others) because it was taking care about any distance functions/metrics/precomputed matrix almost everywhere.