annoviko / pyclustering

pyclustering is a Python, C++ data mining library.
https://pyclustering.github.io/
BSD 3-Clause "New" or "Revised" License
1.17k stars 248 forks source link

Number of clusters changing for the same data #675

Closed pablocael closed 3 years ago

pablocael commented 3 years ago

Hi there! Im using Xmenas to calculate a good number of clusters for my data. However, the number of clusters keep changing drasticaly for the same exact data. Is there anything wrong with my implementation? Above is my code.

    from pyclustering.cluster.xmeans import xmeans
    from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
    amount_initial_centers = 8
    initial_centers = kmeans_plusplus_initializer(keypoints_coords, amount_initial_centers).initialize()

    xmeans_instance = xmeans(keypoints_coords, initial_centers, 128)
    xmeans_instance.process()
    # Extract clustering results: clusters and their centers
    centers = xmeans_instance.get_centers()

    print('number of clusters centers found = ', len(centers))

keypoints_coords are an array of 2d points, and its fixed.

What I would expect is that the total number of cluster would converge for about the same value for the same data. Its varying between 9 and 40 between runs. I have a feeling Im initializing something wrong? Thanks in advance.

pablocael commented 3 years ago

Nevermind I found how to set the seed (random_state). Thanks!