deepglint / unicom

[ICLR 2023] Unicom: Universal and Compact Representation Learning for Image Retrieval
https://arxiv.org/pdf/2304.05884.pdf
227 stars 17 forks source link

About kmeans clustering #20

Open zheng-xing opened 5 months ago

zheng-xing commented 5 months ago

Hi,

First thanks for such a great work and making it open.

I notice in your paper you mentioned,

Can you add more details about which particular tools did you use for this clustering step? I am very curious as usually kmeans can only handle small cluster sizes.

Thanks very much.

anxiangsir commented 5 months ago

We utilized a cluster of 20 machines, each equipped with 8 V100 GPUs, for parallel hierarchical clustering. Each V100 was responsible for clustering 20 million images into 1 million cluster centroids. Subsequently, we aggregated the centroids from all 20 machines, each contributing 1 million centroids, into a final set of 1 million centroids.

The library employed for this operation was faiss-gpu.

zhangluustb commented 4 months ago

We utilized a cluster of 20 machines, each equipped with 8 V100 GPUs, for parallel hierarchical clustering. Each V100 was responsible for clustering 20 million images into 1 million cluster centroids. Subsequently, we aggregated the centroids from all 20 machines, each contributing 1 million centroids, into a final set of 1 million centroids.

The library employed for this operation was faiss-gpu.

Thank you for sharing. May I ask if this portion of the code can be made open source?