davidsandberg / facenet

Face recognition using Tensorflow
MIT License
13.81k stars 4.81k forks source link

Incredible new results! #152

Closed melgor closed 7 years ago

melgor commented 7 years ago

Congratulation about new results. I've a couple of question about the results:

  1. About results on CASIA-WebFace: did you use clean_list or all images from CASIA?
  2. About results on MS-Celeb:
    • as I understand, you filter the data using your own method. Could you release the clean_list or maybe you do this online, so there is no such thing?
    • what about learning parameters, there are the same like in the Wiki?
    • how many images do you have in data and how long does it take to learn the network?
    • what was your final accuracy at MsCeleb data?

Again, congratulation for a really good results!

davidsandberg commented 7 years ago

Thanks @melgor!

  1. For the CASIA results no additional cleaning is used, but I use the casia-maxpy-clean version.
  2. For the MS-Celeb-1M results I cleaned the dataset by selecting a subset of the training images based on the distance for each image to the class center. I plan to write a wiki page for how I did it when I get some time. I didn't use a clean list as such, but instead had a file containing the distance between that image embedding to its class center. And when running training I can specify decide to use only the 75% of the images that are closest to its class center. Learning parameters are different as well. In #48 you can see the learning curves with some hyper parameters in the legend (wd=weight decay, cl=center loss, Kp=keep percentile, 75 => keep only the 75% of the images closest to its class centers). For all runs I removed classes with less than 60 images. The number of images and classes depends on the settings, but for example for Kp=75 gave me 4 213 410 images over 51 261 classes. Training took ~43 hours (250 000 steps with batch size 90), but the learning rate schedule can probably be optimized quite a bit). Final accuracy has varied between 0.993 and 0.995.
scotthong commented 7 years ago

Hi @davidsandberg: Thanks for sharing this project and congratulation about new results! I am trying to reproduce the results using the MS-Celeb-1M dataset. I couldn't find the the following two hyper parameters documented your post, and can you share the parameter you've used to achieve the results.

--keep_probability --center_loss_alfa

Thanks!

ugtony commented 7 years ago

Hi, @davidsandberg, I used your msceleb1m model and calculating_filtering_metrics.py to get the distance_to_center values on msceleb1m dataset.

I plotted the histogram of distance_to_center in order to get a better idea how keep_percentile can be set.

And then I found the histogram seems to be formed by two Gaussian models. The left one is centered around 0.6 and the right one is center around 1.0. Do you have any idea how to explain that?

Are the instances belong to the right Gaussian the noisy data? Or are they profile faces? If the answer is the former one, then setting percentile 75 might be risky(threshold=0.95, the probability that a sample belong to the right Gaussian is 98% at this position). If the answer is the latter one, then center_loss didn't make the model do well on profile faces. default

se7oluti0n commented 7 years ago

@ugtony @davidsandberg in file calculating_filtering_metrics.py, you calculate embeddings features for each image. As I understand you do filtering images in training phase for MS-Celeb-1M. So which model you use in calculating_filtering_metrics.py to calculate embeddings features? May be model trained with Casia dataset

ljstrnadiii commented 7 years ago

Hi @davidsandberg ,

I have this idea to use the embeddings in another model. I am building a VAE model to generate faces and I would like to use the embeddings of FaceNet to add a loss to the reconstruction of the decoder. I am thinking something like:

total_loss = tf.reduce_mean( (1-alpha)reconstructionL2 + alphaembeddingsL2)

Do you have any thoughts or concerns? I am trying to dig into your code to get an understanding of how I could generate the embeddings of the actual and generated images at the loss, but it is not that intuitive at the moment.

Any suggestions on how to incorporate the embedding into a loss function?

cheers

davidsandberg commented 7 years ago

Hi @ljstrnadiii, That sounds like a very cool project. I'm looking forward to hear more about it. You could for example start with train_softmax.py and use your total_loss above instead of the softmax_cross_entropy. But then you of course need to add the VAE parts as well.

ljstrnadiii commented 7 years ago

@davidsandberg ,

Thanks for the suggestion. I am pretty close to implementation. What is the file format of the pretrained model you restore in the softmax_train.py file? I am pointing the restore function to the ckpt file in the pretrained model dir, but no luck. It says maybe the file is in a different format

ljstrnadiii commented 7 years ago

I believe it has to do with just restoring the variables in the inception_resnet_v1 model. Any ideas?