Question: what algorithm to use for clustering with embedding vectors?

davidsandberg / facenet

Face recognition using Tensorflow

MIT License

13.77k stars 4.81k forks source link

Question: what algorithm to use for clustering with embedding vectors? #488

Closed bobzsj87 closed 6 years ago

bobzsj87 commented 7 years ago

Just wondering how do I do clustering after I get a bunch of embeddings from a number of pictures?

MaartenBloemen commented 7 years ago

I have done it with DBSCAN but based on the ecleudian distance you can calculate with the embedding vectors. You can find my code at #441 if you are interested.

zhly0 commented 7 years ago

this is a different cluster algorithm: https://github.com/zhly0/facenet-face-cluster-chinese-whispers-
which you do not need to specify the number of cluster number,all you need to do is specify the threshold. https://github.com/davidsandberg/facenet/issues/370

MaartenBloemen commented 7 years ago

@zhly0 The DBSCAN algorithm doesn't require you to specify the clusters, you just feed it the distance matrix and it will cluster based on the chosen threshold as well.

Shahnawazgrewal commented 7 years ago

I have also cluster using the DBSCAN as it doesn't require to specify the clusters.

RaviRaaja commented 6 years ago

@Shahnawazgrewal @MaartenBloemen I have tried clustering with out using mtcnn (align function) , precomuputed and cropped all images using dlib in a directory of shape(4000,160,160,3), all 4000 instances belongs to different classes (images with various different brightness, saturation etc.. ) in cluster.py, i have substituted at main() as 'In main function
with tf.Graph().as_default(): with tf.Session() as sess: facenet.load_model(args.model) #image_list = load_images_from_folder(args.data_dir) #images = align_data(image_list, args.image_size, args.margin, pnet, rnet, onet) images = load_images_from_folder(args.data_dir) Then performed clustering only Repeated images falls into same cluster. threshold tested with 1.0 to 0.5 still all resulted same! , MY INPUT DIR CONTAINS IMAGE LIKE THIS BELOW selection_021

Shahnawazgrewal commented 6 years ago

which pretrain model are you using? @RaviRaaja

RaviRaaja commented 6 years ago

@Shahnawazgrewal 20170512-110547 and vgg face 2 pretrained model uploaded by you

Shahnawazgrewal commented 6 years ago

can you upload the subset of images (say 100) on dropbox to test? @RaviRaaja

RaviRaaja commented 6 years ago

@Shahnawazgrewal https://www.dropbox.com/sh/eitw9wuh7lkjgz6/AAC863xj1LSUeCrDGob2N0x_a?dl=0

Shahnawazgrewal commented 6 years ago

Some of the clusters with eps = 0.50 screenshot_2018-03-24_15-07-46 screenshot_2018-03-24_15-08-37 screenshot_2018-03-24_15-08-55 screenshot_2018-03-24_15-09-23 screenshot_2018-03-24_15-09-31

You can also use HDSCAN. @RaviRaaja

RaviRaaja commented 6 years ago

@Shahnawazgrewal will try with hdscan , can you add some more note about dbscan experiment for above results , Chinese whisper clustering normalisation is not done , and in dbscan normalization is done , does prewhiten(module name) do have impact on clustering??

Shahnawazgrewal commented 6 years ago

I have not experiment with prewhiten module. @RaviRaaja Apparently, results are okay.