Validate with MegaFace Challenge dataset

se7oluti0n commented 7 years ago

Hello, I have downloaded MegaFace challenge data for evaluating face recognition, both Identification and Verification problem. I think testing result on LFW is not enough because 2 reasons:

LFW test only for verfication problem
LFW is not big dataset

MegaFace challenge provide large dataset for testing face recognition with 1M distractors, testing on FaceScrub dataset. Here is some result: http://megaface.cs.washington.edu/results/facescrubresults.html

I'm also write code for extract features and convert to MegaFace format. But I worry that the input images from Megaface is aligned in different way with Facenet code.

Could you help me check this code @davidsandberg Here is the link: https://gist.github.com/se7oluti0n/8ff161505721b6c4ab25ccfe7996fd1a

se7oluti0n commented 7 years ago

here is some sample image from MegaFace. Its included Facescrub, FGnet for known subjects, and Megaface for unknown subjects (distractor) https://drive.google.com/file/d/0B51U-GiVgFCrcWI2bllteWVjMzA/view?usp=sharing

se7oluti0n commented 7 years ago

I have tested the lasted model 20170512 with MegaFace challgene 1 (FaceScrub). This is result compared with others method (both on identification problem and verification on 1M distractor problem) . The blue line (david) show this repo's result. Pls check this @davidsandberg

Verification on 1M distractors

Identification on 1M distractors

davidsandberg commented 7 years ago

Hi @se7oluti0n, Very nice plots!! I'm a bit surprised by the low identification rate at a small number of distractors (e.g. 10). I haven't looked at the details of how the test is set up, but I don't have any good explanation for this. Can you make a PR to add the test script to the repo so I can have a look?

ugtony commented 7 years ago

Hi, @se7oluti0n, I also ran the megaface challenge with the official script (run_experiment.py), but I don't know how to plot the results as you did. Could you tell me how you do it?

I used a model trained with casia-webface and get the rank-1 identification rate: Distractor = 10, Rank1 = 0.9849 Distractor = 100, Rank1 = 0.9597 Distractor = 1,000, Rank1 = 0.9132 Distractor = 10,000, Rank1 = 0.8396 Distractor = 100,000, Rank1 = 0.7192 Distractor = 1,000,000, Rank1 = 0.5612

In your second graph, the blue line starts at a low identification rate(~0.92). It is probably because you didn't check the mtcnn-detected regions with the groundtruth for the facescrub dataset.

se7oluti0n commented 7 years ago

@davidsandberg @ugtony I'm a bit confused because my line is not the same shape with others. May be there are some mistakes in my steps. With distractor = 10, Rank 1 accuracy should be high as 0.98..

The detail steps:

Download megaface distractor set (1M images), keep it as it is ( no mtcnn alignment)
Download facescrub set, aligned with mtcnn
Run scripts to extract features ( including preprocessing like resize, pre whittening, just like validate on lfw) code in https://gist.github.com/se7oluti0n/8ff161505721b6c4ab25ccfe7996fd1a , evaluation kit is quite big so I upload to drive: https://drive.google.com/file/d/0B51U-GiVgFCrZ1dYYjdJYnVRZHc/view?usp=sharing

@ugtony When run the test with aligned facescrub images on Megaface site, the result is not good, may be because they aligned different way compare to mtcnn. So I do the alignment with mtcnn, but I did not check with the groundtruth.

How did you check the mtcnn-detected regions with the groundtruth for the facescrub dataset?
To plot the results, I downloaded current result (http://megaface.cs.washington.edu/Challenge1JSON.zip) and plot like this https://gist.github.com/se7oluti0n/8b83bf1610972de6127835b4fd59eea7

davidsandberg commented 7 years ago

Thanks! My guess is that the distractor set should be aligned using MTCNN as well. For the model to be able to discriminate well between faces the embedding needs to be precise also for the distractors. Without alignment the embeddings will be "noisy" which I guess could impact performance.

siebertlooije commented 7 years ago

@se7oluti0n Thanks for this, very clear explanation !

ugtony commented 7 years ago

@se7oluti0n, Since there may be more than one faces in facescrub images, when align_dataset_mtcnn.py is used without modification, the probe set would contain a few wrong images. That is the reason why the identification rate got only ~0.92 when distractor=10. After all, it can't be right when a probe image is wrong.

Therefore, if multiple faces are detected in align_dataset_mtcnn.py, you should choose the one overlapping most with the groundtruth bbox. The bbox info is listed in the .json files, the same file you should use when face detection fails.

Thanks for explaining how the figures are plot.

@davidsandberg I think the performance would be better than it actually is when megaface distractor set is not aligned, that should be the reason why se7oluti0n's curve doesn't drop so much when distractor = 1,000,000.

In short, in my opinion, the left portion of se7oluti0n's identification curve is lower than others' because some images in probe are wrong. The right portion of the curve doesn't decline so much because the probe/gallery sets are well aligned but the distractor set isn't.

se7oluti0n commented 7 years ago

@ugtony Thanks for the clear explanation. I wonder which classifier (center loss or triplet loss) you used when retrained with casia-webface? Is that the cleaned version casia-maxpy-clean?

@davidsandberg The results looked very competitive with others state of the art method e.g ntech lab Findface (http://fusion.kinja.com/this-face-recognition-company-is-causing-havoc-in-russi-1793856482)

For the center loss verion, if we use large dataset, will the performance be improved? MegaFace challenge also provide quite large datasets (4M images of 650k identities)
How about triplet loss version? I have not trained this yet but it seemed having problem of converging. Recently I found a paper that improve triplet loss, could you have a look at https://arxiv.org/pdf/1703.07737.pdf

ugtony commented 7 years ago

I used center loss to train my classifier.
I used the original casia-webface. I checked a few removed images in casia-maxpy-clean before, but I found some of them seem to be right. I guess it is cleaned by some automatic algorithm instead of manually. Beside, I didn't get better performance with it. The performance of my model on LFW is 0.989 by the way.

ugtony commented 7 years ago

Hi @davidsandberg,

I used the code shared by @se7oluti0n to plot my result (a model trained with facenet_train_classifier.py on casia-webface) on megaface challenge 1. My result is plotted with red color, please take a look.

The performance is competitive when the false positivie rate > 0.001 and the distractors number < 10,000, while become worse than nearby curves when false positive rate is low and number of distractors is high.

default

Any idea why this happens? Maybe it's just because my training dataset size is smaller than others.

se7oluti0n commented 7 years ago

@ugtony @davidsandberg I found I have mistake when extracting features for distractor's images:

In preprocessing, I use tf.image.resize_image_with_crop_or_pad(image, image_size, image_size) without noice that distractor's images is not aligned, and the challenge do provide aligned result in json files.
I have same mistake on aligned images of Facescrub downloaded from Megaface which has size of 300x300,

This is the result using aligned facescrub downloaded from MegaFace challenge, but the distractors are not aligned, use raw image. I will update the result after align distrator later

ugtony commented 7 years ago

It's good to know that TP increases to ~0.75 when FP=10^-6. It is pretty close to the "CenterLoss" performance (76.51%) reported in the paper "A Light CNN for Deep Face Representation with Noisy Labels".

I guess the identification rate would drop to somewhere around 0.65 after the distractor images are aligned.

se7oluti0n commented 7 years ago

@ugtony you are right. I guess without the distractor's alignment, the distractor images is not really a facical image, so it is easier to recognize probes from distractors.

latest results here. This results is very similar to the results in original center loss paper A Discriminative Feature Learning Approach for Deep Face Recognition (~65% for identification and 76% for verification) Imgur Imgur

yao5461 commented 6 years ago

@ugtony @se7oluti0n Hi, do you know how to define a new score model when I evaluate my model on Megaface? I want to use cosine similarity to measure the distance instead of euclidean distance. Thanks! :)

Not-IITian commented 5 years ago

Hi,

Is it possible to evaluate your model on top-k accuracy on Megaface challenge other than top-1? e.g. k =3,5, 10.

Thanks

yuchen-xue commented 5 years ago

@se7oluti0n Could you please renew the link to your evaluation kit. The original link seems broken.

ghost commented 5 years ago

Hi, Do we need to normalize the images of Facescrub and Megaface before feeding them to the model to get features? Thank you for reading.

FlyingAnt2018 commented 4 years ago

hello! may i have your megaface dataset? i have applied for it at http://megaface.cs.washington.edu/dataset/download.html,but there is no reply! Thanks!

davidsandberg / facenet

Validate with MegaFace Challenge dataset #275