LeslieTrue / CPP

This is the official implementation for Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models.
22 stars 3 forks source link

Number of Clusters Estimation #1

Closed pion-changho closed 1 year ago

pion-changho commented 1 year ago

Hello,

Firstly, I'd like to express my gratitude for sharing this code and your dedication to the project. After going through your paper, my understanding was that the CPP algorithm estimates the number of clusters. However, in the optimalcluster.py file, it seems that the number of clusters is required as an input.

Have I misunderstood something, or is there a part of the implementation that is yet to be completed?

Thank you for your time and clarification.

LeslieTrue commented 1 year ago

Hi, Thanks for reaching out. I assume that you are confused with parser.add_argument('--num_clusters', type=int, default=10, help='number of clusters') in optimalcluster.py The variable is only used as a validation for datasets with ground truth labels and a given number of clusters, i.e. CIFAR-10 and ImageNet-1k, which specifically required in acc_lst, nmi_lst, _, _, pred_lst = spectral_clustering_metrics(Pi_np, args.num_clusters).

That is to say, you may comment it out when applying CPP on label-free datasets. Hope it address your concern.

Best, Tianzhe

pion-changho commented 1 year ago

I understand now. This is the label for the clustering result: https://github.com/LeslieTrue/CPP/blob/main/main_efficient.py#L117 Thanks for clarifying!