davidsandberg / facenet

Face recognition using Tensorflow
MIT License
13.69k stars 4.8k forks source link

MS-Celeb vs CASIA model #401

Open aleksandar opened 7 years ago

aleksandar commented 7 years ago

Hi @davidsandberg , do you maybe measured results on FAR 10e-4? There are some interesting results. Model trained on CASIA dataset is better on this FAR (10e-4) than model trained on MS-celeb. Our test protocol is mix of LFW and blufr (http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/) protocols. Your models was tested and also models trained on these sets (following your best practices).

You can "see" this difference between models just by comparing histograms of distances on test set.

Histogram for casia model: casia hist

Histogram for ms-celeb model: ms-celeb hist

MS-Celeb model has a negative pairs on lower distances than casia model, therefore ms-celeb model on lower FAR has a lower TPR.

Do you have any idea why is this happening? It may be consequence of training with larger number of classes and softmax caracteristics.

qixianbiao commented 7 years ago

Have not done something like this. Remind you that the features should be L2 normed before computing distance or similarity....

aleksandar commented 7 years ago

the embeddings layer is normalized by default in david's implementation. the question is why these two models have a little bit different distributions, acutally variances on same test pairs. model trained on casia set has lower variance and this is better for lower FAR.

Shahnawazgrewal commented 6 years ago

@aleksandar did you clean MS-Celeb dataset before training?

aleksandar commented 6 years ago

yes, we did. chinese whispers algorithm was used, and model trained on casia set.

Shahnawazgrewal commented 6 years ago

I developed a pipeline to clean the MS-Celeb dataset. I employed DBSCAN to clean the MS-Celeb dataset. A similar pipeline is also employed in MegaFace dataset creation. I am not sure if I can publish this work. Any suggestions? @aleksandar

aleksandar commented 6 years ago

DBSCAN is also good algorithm for this, nice idea. regarding permissions on this project you should ask the owner, @davidsandberg