Number of Clusters Estimation

pion-changho commented 1 year ago

Hello,

Firstly, I'd like to express my gratitude for sharing this code and your dedication to the project. After going through your paper, my understanding was that the CPP algorithm estimates the number of clusters. However, in the optimalcluster.py file, it seems that the number of clusters is required as an input.

Have I misunderstood something, or is there a part of the implementation that is yet to be completed?

Thank you for your time and clarification.

LeslieTrue commented 1 year ago

Hi, Thanks for reaching out. I assume that you are confused with parser.add_argument('--num_clusters', type=int, default=10, help='number of clusters') in optimalcluster.py The variable is only used as a validation for datasets with ground truth labels and a given number of clusters, i.e. CIFAR-10 and ImageNet-1k, which specifically required in acc_lst, nmi_lst, _, _, pred_lst = spectral_clustering_metrics(Pi_np, args.num_clusters).

That is to say, you may comment it out when applying CPP on label-free datasets. Hope it address your concern.

Best, Tianzhe

pion-changho commented 1 year ago

I understand now. This is the label for the clustering result: https://github.com/LeslieTrue/CPP/blob/main/main_efficient.py#L117 Thanks for clarifying!

LeslieTrue / CPP

Number of Clusters Estimation #1