XiaohangZhan / cdp

Code for our ECCV 2018 work.
MIT License
454 stars 93 forks source link

Comparison with hierarchical clustering #4

Closed Zhongdao closed 5 years ago

Zhongdao commented 6 years ago

Hi Xiaohang, I am curious about how you apply hierarchical clustering on such a large dataset since the best complexity is only O(n^2). Do you use any technique to accelerate hierarchical clustering?

By the way I'm also interested in how CDP performs against hierarchical clustering based on a single face embedding, I'd appreciate it if you can offer some comparison.

Thanks a lot!

XiaohangZhan commented 6 years ago
  1. We use faiss https://github.com/facebookresearch/faiss to search k-NN. It decreases the complexity to NxK. Hierarchical merging is only performed twice, and the metric is the distance of the centers of two clusters. According to our observation, further merging brings minor improvements.

  2. With single model, the mediator is not applicable, then CDP devolves into simply thresholding the k-NN graph and dynamically searching connected components. It is much weaker than the full setting of CDP. Comparison can be found in Table 2 in our paper. Clustering achieves recall/prec/fscore 0.558/0.95/0.703, and CDP with committee number = 0 yields 0.68/0.829/0.747. CDP is slightly better than clustering.

Zhongdao commented 6 years ago

Thank you for your time! So it is not a standard Agglomerative Clustering. Putting aside the complexity, I believe a standard Agglomerative Clustering will achieve a satisfactory performance.

XiaohangZhan commented 6 years ago

You are right.