A Pre-trained model with Accuracy: 0.9957+-0.0028

scotthong commented 7 years ago

Hi,

First of all, thanks to @davidsandberg for the great facenet project.

It would be more appropriate if there is a mailing list or forum for this kind of post. My apologies for posting this as an issue!

I was able to reproduce the result using the train_softmax.py script with slightly better performance (LFW Accuracy: 0.9957+-0.0028).

Training dataset: MS-Celeb-1M
Embedding size: 1792d
Network model: models.inception_resnet_v1
Download link of the pre-trained model: facenet_213250_20170620.pb. This pre-trained model was exported using freeze_graph.py.
You can follow the following steps to validate on the LFW dataset:
- Clone facenet git repository
  
  git clone https://github.com/davidsandberg/facenet.git
- Change to facenet/src
  
  cd facenet/src
- Download facenet_213250_20170620.pb and save it to /tmp
- Set the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION environment variable to increase the size of protobuf that can be loaded by python.
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
- Run validate_on_lfw.py python script
python validate_on_lfw.py \
/datasets/lfw.mtcnn.s182-m44 \
/tmp/facenet_213250_20170620.pb \
--lfw_pairs=../data/pairs.txt \
--lfw_batch_size=100 \
--lfw_file_ext=jpg \
--image_size=160
LFW Performance:

Accuracy: 0.9957+-0.0028 Validation rate: 0.98467+-0.01137 @ FAR=0.00133 Area Under Curve (AUC): 0.9996 Equal Error Rate (EER): 0.0043

Please let me know if you can validate on LFW with the same performance.

Thanks,

--Scott Hong

qixianbiao commented 7 years ago

Great Job,

I wonder have you trained your model only using CASIA dataset??

I saw that David had provided a model around 99.4% before, and late he decreased it to 99.2%. As pointed out by a previous issue: https://github.com/davidsandberg/facenet/issues/190 “However, have you noticed that there are more than 1,000 identities' overlap between MS-Celeb-1M (100K of people) and LFW dataset（5K）?”

scotthong commented 7 years ago

I don't know exactly how many identities are overlap between MS_Celeb-1M and LFW and how many are used for training. I've also validated model using private datasets where none of the identities are in the MS_Celeb-1M dataset and the accuracy is also very good. I think the feature/representation trained using (train_softmax) is generic enough to be used for face recognition.

These overlapped identities between MS_Celeb-1M and LFW do favor the accuracy on LFW to achieve better results faster. Based on my experiment, the non-overlapped validation dataset can also reaches the accuracy level/ceiling when the model is trained longer.

ugtony commented 7 years ago

Hi, @scotthong I was wondering why you can achieve better results than David's? Is it because you set the embedding size to 1792 instead of 128?

scotthong commented 7 years ago

The MS_Celeb-1M dataset is very dirty. I think the key difference might be on how the dataset is been filtered. You can try to use the model I've provided to generate the filtering metrics and try to train the model yourself, you might be able to get better result then the model provided by David.

As for the embedding size:

I've tried the Bottleneck layer to limit the embedding size to 128d, 256d and 512d with similar accuracy level. However, these models trained with reduced embedding size, while have similar accuracy on LFW, have poor performance in practice. The observation is based on my own experiments on my personal albums and private datasets where there are too many false positive cases when compared with the embedding size trained with 1792d.

scotthong commented 7 years ago

I also like to share some test results comparing the impact on accuracy when the LFW datasets (deep funneled and original) are pre-processed using different margins. These pre-processed datasets are then feed through validate_on_lfw.py using these two pre-trained models below. The best accuracy (0.9968) can be obtained when the LFW dataset is pre-processed using the deep funneled LFW dataset with size=160 and margin=30. If the wrongly labeled pairs are also accounted for, the accuracy would be even higher.

103250: pre-trained model 20170512-110547 published by @davidsandberg 213250: pre-trained model facenet_213250_20170620.pb as provided here.

#	dataset	103250	213250
0	lfw.mtcnn.aligned.s182-m44	0.9927	0.9952
1	lfw.mtcnn.cropped.s182-m44	0.9928	0.9963
2	lfw.mtcnn.aligned.s160-m32	0.9917	0.9962
3	lfw.mtcnn.cropped.s160-m32	0.9912	0.9965
4	lfw.mtcnn.aligned.s160-m30	0.9930	0.9968
5	lfw.mtcnn.cropped.s160-m30	0.9930	0.9963
6	lfw.mtcnn.aligned.s160-m28	0.9918	0.9962
7	lfw.mtcnn.cropped.s160-m28	0.9922	0.9962
8	lfw.mtcnn.aligned.s160-m26	0.9928	0.9952
9	lfw.mtcnn.cropped.s160-m26	0.9932	0.9960
a	lfw.mtcnn.aligned.s160-m24	0.9930	0.9963
b	lfw.mtcnn.cropped.s160-m24	0.9935	0.9962
c	lfw.mtcnn.aligned.s160-m22	0.9942	0.9950
d	lfw.mtcnn.cropped.s160-m22	0.9928	0.9963

lrsperanza commented 7 years ago

Has anyone else tested @scotthong model?

scotthong commented 7 years ago

The LFW validation report with embedded images can be downloaded using the link below:

lfw.mtcnn.aligned.s160-m30.html

Comments: The link to the validation report is no longer available for download. Instead, I've added the validation report here.

Validation Report Number of pairs: 6000 Accuracy: 0.9968+-0.0027 Validation rate: 0.99300+-0.00586 @ FAR=0.00067 Area Under Curve (AUC): 0.9996 Equal Error Rate (EER): 0.0039 False Positive Pairs: 3 False Negative Pairs: 16

lrsperanza commented 7 years ago

I just tested @davidsandberg model against faced rotated so that the pupils have the same Y position. Got a great improvement in the accuracy just by doing this. I didn't test it against LFW because I'm using a private dataset, but the error rate lowered by 34.7% on my dataset (50.000 faces) just by doing this. I'm going to test the same technique with @scotthong model soon

scotthong commented 7 years ago

I am in the process of creating a git repository to put related files in one place so that it would be easier for reference and discussion. I will update this issue and add a link to the repository when the project is ready. The LFW validation report can be downloaded using the same link as posted earlier.

ForestWang commented 7 years ago

Hi scotthong: very good result and work. is it the embedded size is 1792d with Bottleneck layer when you training？ or extract the feature from other layers? thanks.

scotthong commented 7 years ago

I did not use the Bottleneck layer to train the network.

ForestWang commented 7 years ago

So on which layer, you extract the embedded with 1792d? thank you very much.

scotthong commented 7 years ago

The embedding is the l2_normalized prelogits. The model is exported using freeze_graph.py. The key difference as mentioned earlier, the Bottleneck layer is not used in the network.

embeddings = tf.nn.l2_normalize(prelogits, 1, 1e-10, name='embeddings')

AleximusOrloff commented 7 years ago

Hi, it seems that MS dataset is cheating. I used my own database and achieved only 98.9% on LFW When I added MS dataset result became 99.4% on LFW, 99.4% seems too high for my CNN (btw I use VGG16)

My private dataset is bigger (6mln images) and clearier that MS's one

ForestWang commented 7 years ago

@AleximusOrloff MS dataset has over a thousand ID overlap with LFW.

hardfish82 commented 7 years ago

@scotthong I'm afraid that " facenet_213250_20170620.pb" can't be accessed now. Can you update a new one?

AleximusOrloff commented 7 years ago

So, If we cannot rely on LFW, could one advice an alternative benchmark?

ForestWang commented 7 years ago

@scotthong How about git repository for your work? thanks.

ForestWang commented 7 years ago

@ AleximusOrloff megafce benchmark.

AleximusOrloff commented 7 years ago

@ForestWang MegaFace is based on FaceScrub - which is also "Celebrities" dataset So its also unrelaible, cause all training dataset contains Celebrities

scotthong commented 7 years ago

I might have the model released too early than it should be before my report is ready. Please allow me to wrap up my main project that depends on this model. Thanks! --Scott Hong

  From: ForestWang <notifications@github.com>

To: davidsandberg/facenet facenet@noreply.github.com Cc: Scott Hong yowmhong@yahoo.com; Mention mention@noreply.github.com Sent: Thursday, July 20, 2017 10:30 PM Subject: Re: [davidsandberg/facenet] A Pre-trained model with Accuracy: 0.9957+-0.0028 (#339)

@scotthong How about git repository for your work? thanks.— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

scotthong commented 7 years ago

As promised,

I've created a github repository as a placeholder for the project I am working on. The link to the pre-trained model can be found in the page below. This application uses the TensorFlow Model Serving to serve the pre-trained model for face recognition.

https://github.com/scotthong/hmc-media-server

An updated model trained for more epoches is included and exported using a custom TensorFlow Model Serving script. Please refer to the link below to download the pre-package TensorFlow Model Server hosting the updated model.

https://github.com/scotthong/hmc-media-server#facenet

I also created a gitter chatroom. Please visit the chatroom below if you have questions:

https://gitter.im/HmcMediaServer/earlyadopters

Thanks,

--Scott

AleximusOrloff commented 7 years ago

@scotthong I cleared MS dataset from LFW items and accuracy dropped to 99.1% from 99.4% Since you have 1792D embeddings and smaller dataset than me, your overffiting could be even greater. Please check

scotthong commented 7 years ago

Here is the info for the subset of MS dataset used to train my model: In the worst case scenario, there will be 10.15 % of overlapped classes and 0.34 % of overlapped examples. It should be easy to create a script to filter out these classes used for training. But then again, using the same classes for training does not equal to using the same example for training. I am not so concerned with the duplicated classes but rather the distribution of the examples based on gender, age, ethnic and # of example in the same class.

scotthong commented 7 years ago

@AleximusOrloff

I created a script to identify the potential duplicates between the MS-Celeb-1M and LFW datasets. These duplicated classes are then excluded during training. Two training jobs are currently running on my machine. The first training job is using the same hyper parameters as the baseline with the duplicated classes (about 2600 classes) excluded. The difference (still in progress) on the LFW accuracy is about 0.0006 (out of 1.0). The second training job is running with the hyper parameter (filter_min_nrof_images_per_class) adjusted from 60 to 50 in order to compensate for these excluded classes. So far, the trend indicates that it should have slightly better accuracy than the baseline model (213250).

I can draw an early conclusion that "overfitting" should not be an issue using properly filtered MS-Celeb-1M dataset.

--Scott

ForestWang commented 7 years ago

Hi Scott, For training ms-celeb with 1792d, could you paste you training scrip parameter as a reference. Thank you very much.

scotthong commented 7 years ago

103250: 20170512-110547 published by @davidsandberg 213250: facenet_213250_20170620.pb as provided here. 216295: model with LFW classes cleared from training with lowered filter_min_nrof_images_per_class value in order to compensate for these removed classes. 217288: model with LFW classes cleared from training while keeping the same filter_min_nrof_images_per_class as 213250

#	dataset	103250	213250	216295	217288
0	lfw.mtcnn.aligned.s182-m44	0.9927	0.9952	0.9968	0.9953
1	lfw.mtcnn.cropped.s182-m44	0.9928	0.9963	0.9963	0.9960
2	lfw.mtcnn.aligned.s160-m32	0.9917	0.9962	0.9962	0.9945
3	lfw.mtcnn.cropped.s160-m32	0.9912	0.9965	0.9962	0.9940
4	lfw.mtcnn.aligned.s160-m30	0.9930	0.9968	0.9968	0.9958
5	lfw.mtcnn.cropped.s160-m30	0.9930	0.9963	0.9957	0.9950
6	lfw.mtcnn.aligned.s160-m28	0.9918	0.9962	0.9972	0.9955
7	lfw.mtcnn.cropped.s160-m28	0.9922	0.9962	0.9965	0.9953
8	lfw.mtcnn.aligned.s160-m26	0.9928	0.9952	0.9960	0.9947
9	lfw.mtcnn.cropped.s160-m26	0.9932	0.9960	0.9965	0.9948
a	lfw.mtcnn.aligned.s160-m24	0.9930	0.9963	0.9968	0.9958
b	lfw.mtcnn.cropped.s160-m24	0.9935	0.9962	0.9962	0.9948
c	lfw.mtcnn.aligned.s160-m22	0.9942	0.9950	0.9968	0.9953
d	lfw.mtcnn.cropped.s160-m22	0.9928	0.9963	0.9963	0.9947

scotthong commented 7 years ago

@ForestWang Here are some key hyper parameters you can try.

--random_crop --random_flip --filter_percentile=75 --filter_min_nrof_images_per_class=60 --weight_decay=5e-5 --center_loss_factor=5e-5 --keep_probability=0.8 --center_loss_alfa=0.9

chenfsjz commented 7 years ago

hi,I want to train my model use Mscelb-1m, and how you deal with the Msceleb-1m,because there are lots of dirty images.Thank you!

allenxcp commented 6 years ago

@scotthong hi Scott Hong : I tested your model: facenet_213250_20170620.pb like this:

1 modify inception_resnet_v1.py line 154: bottleneck_layer_size=1792, and line 131: bottleneck_layer_size=128, weight_decay=0.0, reuse=None) to bottleneck_layer_size=1792, weight_decay=0.0, reuse=None)

2 run python align_dataset_mtcnn.py /media/ubuntu1/HD4T/Image/lfw_funneled/ /media/ubuntu1/HD4T/Image/lfw_funneled.s182.m44 --image_size 182 --margin 44

3 run python validate_on_lfw.py /media/ubuntu1/HD4T/Image/lfw_funneled.s182.m44/ /home/ubuntu1/ssd1t/Face_XCP/tensorflow/facenet1/models/facenet_213250_20170620.pb

LFW Performance: Accuracy: 0.995667+-0.004 Validation rate: 0.98867+-0.01056 @ FAR=0.00067 Area Under Curve (AUC): 1.000 Equal Error Rate (EER): 0.005

I also tested s160 m30 like this: 1 modify inception_resnet_v1.py like s182.m44

2 run python align_dataset_mtcnn.py /media/ubuntu1/HD4T/Image/lfw_funneled/ /media/ubuntu1/HD4T/Image/lfw_funneled.s160.m30 --image_size 160 --margin 30

3 run python validate_on_lfw.py /media/ubuntu1/HD4T/Image/lfw_funneled.s160.m30/ /home/ubuntu1/ssd1t/Face_XCP/tensorflow/facenet1/models/facenet_213250_20170620.pb

LFW Performance: Accuracy: 0.995500+-0.004 Validation rate: 0.98700+-0.01027 @ FAR=0.00133 Area Under Curve (AUC): 1.000 Equal Error Rate (EER): 0.005

but your list : davidsandberg/facenet#339

I have two questions:

1 The LFW performance is not same with your list, what happened ? the 1792 wrong? 2 What different with lfw.mtcnn.aligned.XXX,lfw.mtcnn.cropped.XXX? is it LFW funneled or original?

scotthong commented 6 years ago

@allenxcp

The facenet code base I am using is a little bit old when the bottleneck layer is still in the main python program. If you checkout the snapshot before the bottleneck layer is moved to "inception_resnet_v1.py" and get rid of the bottleneck layer completely in order to get the same facenet network for inference.

    # Build the inference graph
    prelogits, _ = network.inference(
        image_batch,
        args.keep_probability, 
        phase_train = phase_train_placeholder,
        weight_decay = args.weight_decay
    )

    # bottleneck = slim.fully_connected(
    #     prelogits,
    #     args.embedding_size,
    #     activation_fn=None,
    #     weights_initializer=tf.truncated_normal_initializer(stddev=0.1),
    #     weights_regularizer=slim.l2_regularizer(args.weight_decay),
    #     normalizer_fn=slim.batch_norm,
    #     normalizer_params=batch_norm_params,
    #     scope='Bottleneck',
    #     trainable=True,
    #     reuse=False
    # )
    # logits = slim.fully_connected(
    #     bottleneck, 
    #     len(train_set), 
    #     activation_fn=None, 
    #     weights_initializer=tf.truncated_normal_initializer(stddev=0.1), 
    #     weights_regularizer=slim.l2_regularizer(args.weight_decay),
    #     scope='Logits',
    #     reuse=False
    # )
    # embeddings = tf.nn.l2_normalize(bottleneck, 1, 1e-10, name='embeddings')
    #
    logits = slim.fully_connected(
        prelogits, 
        len(train_set), 
        activation_fn=None, 
        weights_initializer=tf.truncated_normal_initializer(stddev=0.1), 
        weights_regularizer=slim.l2_regularizer(args.weight_decay),
        scope='Logits',
        reuse=False
    )
    embeddings = tf.nn.l2_normalize(prelogits, 1, 1e-10, name='embeddings')

LFW contains two set of images: original and deep funnelled. The original is tagged as "cropped" and the "deep funnelled" is tagged as "aligned" when these images are post processed using mtcnn. I would say that the name "aling_dataset_mtcnn.py" is a little bit confusing as no alignment is applied to the extracted face.

I've tried 128d embedding before and the result is not as good compared to 1792d back then. It has been awhile since my initial assessment. I will try to train the network using the latest facenet codebase using bottleneck=128.

LFW is a good benchmark for face recognition and it might has reached its limitation. You should be able to get good LFW validation result training the inception_resnet_v1 network using MS-Celeb-1M when the dataset is filtered/scrubbed properly. I think the better performance metric is better described in NISTIR 8052 (https://www.nist.gov/publications/face-recognition-vendor-test-frvt-performance-automated-gender-classification) when validation dataset is broken down based on "gender", "age" and "ethnic group". It will expose the deficiencies in your training dataset.

xiaoyaozzr commented 6 years ago

Hi, As validate_on_lfw.py showed, face features can be got by "embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")", and I wonder whether I can get outputs of each layer like the end_points did, and I tried "endpoints = tf.get_default_graph().get_tensor_by_name("end_points")" failed. Thanks.

scotthong commented 6 years ago

"end_points" is a variable name in Python that stores the names and references to the network "blocks". Please refer to "inception_resnet_v1.py" to find the names in the graph that can be accessed using get_tensor_by_name(...).

xmuszq commented 6 years ago

@scotthong what's the margin size you used to created the training set? when you listed you got best result with LFW data with 30 margin, is that possible you used m30 to create the training sets?

Thanks,

scotthong commented 6 years ago

s182-m44 -> size 182 and margin 44 when the training images are pre-processed using MTCNN. The images for training are always pre-processed using s182-m44 parameters. As for the validation dataset, the final image size is always 160 while the margin varies. If the image is bigger, for example 182-m44, it will be cropped before sending to the network for training or inference.

xmuszq commented 6 years ago

@scotthong Another question, since the mtcnn_align in the project didn't do align (only detect and cropped), did you use align in your training data or testing data or both? Thanks,

zhenglaizhang commented 6 years ago

@scotthong Hi, really appreciate your sharing! But I cannot access the model file, 403 error reported, could you please help provide the model again? Thanks in advance -:)

lincolnhard commented 6 years ago

@scotthong Great work, please provide pre-trained model. Thx,

scotthong commented 6 years ago

I no longer use the model with 1792d. Please use the 128d model as provided by David as a starting point to generate a better filtered dataset for training.

zhenglaizhang commented 6 years ago

@scotthong hi, is there any specific reason why you don't use the model with 1792d anymore?

kodonnell commented 6 years ago

There are in depth discussion (the gitter channel below) on how to reproduce the LFW validation results including the fixes/improvements to the align_dataset_mtcnn.py code to prevent image distortion. This fix can potentially improve the performance of the trained network (LFW performance) if that is what you are after. https://gitter.im/HmcMediaServer/earlyadopters

FYI @scotthong has chosen to delete all of this information (in case you go looking for them and can't find it, like I did).

scotthong commented 6 years ago

@zhenglaizhang, I was able to train a 128d model with similar performance as the1792d model. The performance improvement and saving in storage and computation along can justify a minor degration in accuracy. You will have to read many papers, try their ideas and then use cleaner dataset to be able to get better accuracy. That’s my experience I can share with you.

tengerye commented 6 years ago

Did someone successfully repeat his (@scotthong) experiment? I can't find his git repository either.

scotthong commented 6 years ago

I've received lots of emails requesting the access to the validation report file. The validation report was generated using the model trained in early 2017 with 1792d embeddings. The model and the report is no longer available. The image attached is a more recent LFW validation report using the model trained in late 2017 with 128d.

lfw_validation