A new model trained on VGGFace2

Shahnawazgrewal commented 6 years ago

I trained a model on VGGFace2 using center loss. The embedding is powerful than the subset of MS-Celeb. I can make the model public with the two available mode. @davidsandberg

Zumbalamambo commented 6 years ago

@Shahnawazgrewal can you please share with us how you've trained the checkpoint? and can you also share with us the checkpoint?

JianbangZ commented 6 years ago

I trained with VGGFace2 as well, the model is not as good as MS Cele. Although the accuracy might be equal or better than MS, the TPR at 0.001 FPR is much lower (98.X compared to 99.X)

I eventually combined this two datasets and reached 99.73% accuracy and 99.63% TPR w/ 0.001 FPR.

syy6 commented 6 years ago

@Shahnawazgrewal could U please kindly share with us how do U train the VGGFace2 model? When we try to train the data, we always got error as below. Thanks! OutOfRangeError (see above for traceback): FIFOQueue '_1_batch_join/fifo_queue' is closed and has insufficient elements (requested 9, current size 0) [[Node: batch_join = QueueDequeueUpToV2[component_types=[DT_FLOAT, DT_INT64], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch_join/fifo_queue, _arg_batch_size_0_0)]]

syy6 commented 6 years ago

@Shahnawazgrewal For sure, if U can share your model via Google Drive and etc., that is also much appreciated. Thanks!

Dantju commented 6 years ago

@Shahnawazgrewal I add center loss to caffe, I met the error "math_functions.cu:155] Check failed: error == cudaSuccess (11 vs. 0) invalid argument" when I train,have you met this problem or you know how to fix it? thx

Shahnawazgrewal commented 6 years ago

can you please decrease maximum number of epochs. Please read issue #105 I have similar error for MS-Celeb-1M dataset. @syy6

Shahnawazgrewal commented 6 years ago

For sure, I will share the model with you guys. @syy6

Shahnawazgrewal commented 6 years ago

Did you train on a subset of MS-Celeb-1M. @JianbangZ

JianbangZ commented 6 years ago

@Shahnawazgrewal my subset of MS-Celeb-1M is 70k identities, 4.5 million images. I can achieve 99.5% accuracy and 99.3% TPR with it

syy6 commented 6 years ago

@Shahnawazgrewal , actually I even tried to reduce the number of epochs, but the issue still exists......

yipsang commented 6 years ago

@Shahnawazgrewal it would be very nice if you could share the hyperparameters you used in training. I've recently tried to use the VGGFace2 to train by triplet loss but with no luck. The LFW accuracy and validation rate just levelled off at around 0.96 and 0.7.

yipsang commented 6 years ago

@syy6 Did you check this out before? #600

syy6 commented 6 years ago

@yipsang @Shahnawazgrewal , I just find the issue, one of the input png is broken in my computer, so I got this error. After removing the png, it seems to be fine now.

syy6 commented 6 years ago

@JianbangZ, could U please share with us how U take the duplicate between MS & VGG2 dataset? If U look at the namelist of two datasets, certain names are duplicate in both dataset.

Zumbalamambo commented 6 years ago

@Shahnawazgrewal Dude... where have you uploaded your model?

Shahnawazgrewal commented 6 years ago

Here is the link to download a pre-trained model trained with inception-ResNet-v1 with center loss function on VGGFace2 dataset. Please give your general feedback.

Yeongjae commented 6 years ago

@Shahnawazgrewal , Did you perform the pre-training on MS-Celeb-1M and then fine-tune on VGGFace2 dataset?

Shahnawazgrewal commented 6 years ago

No. I didn't I trained with inception-ResNet-v1 with center loss function on VGGFace2 dataset from scratch. More specifically I downloaded loosely cropped faces dataset from the VG-GFace2 (http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/ . The dataset is aligned with 160×160 image size and 32 pixels margin based on Multi-task CNN. I trained the model on aligned dataset for 100 epochs with an RMSProp optimizer. @Yeongjae

RaviRaaja commented 6 years ago

@Shahnawazgrewal could you make proper comments for checkpoint files which has been uploaded in dropbox!

tenggyut commented 6 years ago

@Shahnawazgrewal based on our evaluation, your model is truly powerful than both provided pretrain-model. I wonder the reason behind this wonderful improvement? Is the dataset used for training the root cause?

P.S. Thank your for uploading this wonderful pretrained checkpoint

Shahnawazgrewal commented 6 years ago

@tenggyut , VGGFace2 dataset is considered to be a deep dataset (higher number of image per identity). In my opinion, this could be the reason. In addition, I observed that the model trained on VGGFace2 produced better representation of previously unseen faces.

tenggyut commented 6 years ago

@Shahnawazgrewal did you train the model as a classifier or using triple loss?

Shahnawazgrewal commented 6 years ago

I trained the model based on center loss. @tenggyut

ymcasky commented 6 years ago

@Shahnawazgrewal
I have few questions about your implement detail:

Do you do 2D alignment or just crop 160x160 bounding box after MTCNN.
How is the learning rate you use in RMSProp? Do you decrease learning rate? Thanks for your sharing!

Shahnawazgrewal commented 6 years ago

just crop 160x160 bounding box after MTCNN.
I used default settings available.

DawnHH commented 6 years ago

Did you train the model with softmax loss combined with center loss? or you just train it with center loss?

Shahnawazgrewal commented 6 years ago

combined.

Shahnawazgrewal commented 6 years ago

Validation on LFW dataset with the model trained on VGGFace2 Runnning forward pass on LFW images Model directory: /home/super/datasets/lfw/vggface2-cl Metagraph file: model-20171216-232945.meta Checkpoint file: model-20171216-232945.ckpt-100000 Runnning forward pass on LFW images Accuracy: 0.992+-0.004 Validation rate: 0.96000+-0.01880 @ FAR=0.00067 Area Under Curve (AUC): 0.999 Equal Error Rate (EER): 0.008

JianbangZ commented 6 years ago

I trained with cosine Face algorithms . accuracy is 0.995, validation rate = 0.985

helloyide commented 6 years ago

@JianbangZ what do you mean cosine face algorithms? did you replace the L2 norm in center loss with cosine similarity? Or you mean the paper the author released last year, SphereFace?

DawnHH commented 6 years ago

@JianbangZ Have you tried ArcFace?Can you share your model?

akimo12345 commented 6 years ago

@Shahnawazgrewal Q1: Did you modify learning rate? if yes, can you share your modify value? Q2: Did you modify param before training step, ex: weight_decay, center_loss factor, center_loss alpha..etc, if yes , can you share this? thanks for your helping.

yemenr commented 6 years ago

@Shahnawazgrewal How many epochs do the converged model takes ? What the Loss/RegLoss are after the model converges on the vgg dataset ? Hoping to get your advices. Thank you!

Shahnawazgrewal commented 6 years ago

I use the default settings from the facenet implementation. @akimo12345

No.
No.

Shahnawazgrewal commented 6 years ago

I trained model for 100 epoches. @yemenr

rashmisgh commented 6 years ago

@ShahnawazgrewalI am facing this issue while using this pre-trained model of VGGFace2:- /home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Loading model... 2018-05-21 12:46:43.698367: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-21 12:46:43.698396: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-21 12:46:43.698400: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-05-21 12:46:43.698404: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-21 12:46:43.698422: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. Model loaded Loading MTCNN Face detection model MTCNN Model loaded [INFO] camera sensor warming up... Traceback (most recent call last): File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call return fn(*args) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn status, run_metadata) File "/usr/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value InceptionResnetV1/Conv2d_1a_3x3/weights [[Node: InceptionResnetV1/Conv2d_1a_3x3/weights/read = IdentityT=DT_FLOAT, _class=["loc:@InceptionResnetV1/Conv2d_1a_3x3/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 153, in main(args); File "main.py", line 24, in main camera_recog() File "main.py", line 53, in camera_recog features_arr = extract_feature.get_features(aligns) File "/home/anju/rashmi_folder/FaceRec_old_before_24Apr_2018/face_feature.py", line 30, in get_features return self.sess.run(self.embeddings, feed_dict = {self.x : images}) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value InceptionResnetV1/Conv2d_1a_3x3/weights [[Node: InceptionResnetV1/Conv2d_1a_3x3/weights/read = IdentityT=DT_FLOAT, _class=["loc:@InceptionResnetV1/Conv2d_1a_3x3/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"]]

Caused by op 'InceptionResnetV1/Conv2d_1a_3x3/weights/read', defined at: File "main.py", line 151, in extract_feature = FaceFeature(FRGraph) File "/home/anju/rashmi_folder/FaceRec_old_before_24Apr_2018/face_feature.py", line 21, in init resnet.inference(self.x, 0.6, phase_train=False)[0], 1, 1e-10); #some magic numbers that u dont have to care about File "/home/anju/rashmi_folder/FaceRec_old_before_24Apr_2018/architecture/inception_resnet_v1.py", line 155, in inference reuse=reuse) File "/home/anju/rashmi_folder/FaceRec_old_before_24Apr_2018/architecture/inception_resnet_v1.py", line 185, in inception_resnet_v1 scope='Conv2d_1a_3x3') File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args return func(args, current_args) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 918, in convolution outputs = layer.apply(inputs) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 320, in apply return self.call(inputs, kwargs) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 286, in call self.build(input_shapes[0]) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/layers/convolutional.py", line 138, in build dtype=self.dtype) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable use_resource=use_resource, custom_getter=custom_getter) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable use_resource=use_resource, custom_getter=custom_getter) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 349, in get_variable validate_shape=validate_shape, use_resource=use_resource) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1389, in wrapped_custom_getter args, kwargs) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 275, in variable_getter variable_getter=functools.partial(getter, kwargs)) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 228, in _add_variable trainable=trainable and self.trainable) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1334, in layer_variable_getter return _model_variable_getter(getter, *args, kwargs) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1326, in _model_variable_getter custom_getter=getter, use_resource=use_resource) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args return func(*args, *current_args) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable use_resource=use_resource) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args return func(args, current_args) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable use_resource=use_resource) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter use_resource=use_resource) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable validate_shape=validate_shape) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 197, in init expected_shape=expected_shape) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 316, in _init_from_args self._snapshot = array_ops.identity(self._variable, name="read") File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1338, in identity result = _op_def_lib.apply_op("Identity", input=input, name=name) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/anju/.virtualenvs/dl4cv2/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value InceptionResnetV1/Conv2d_1a_3x3/weights [[Node: InceptionResnetV1/Conv2d_1a_3x3/weights/read = IdentityT=DT_FLOAT, _class=["loc:@InceptionResnetV1/Conv2d_1a_3x3/weights"], _device="/job:localhost/replica:0/task:0/cpu:0"]]

please help to solve this.

LaviLiu commented 6 years ago

@Shahnawazgrewal You model is powerful to my problem. Can you share the code which you used for training the model on VGGFace2 ? I want to fine tuning you model. Thank you very much !

Shahnawazgrewal commented 6 years ago

I used the same code train_softmax.py with default parameters.

LaviLiu commented 6 years ago

@Shahnawazgrewal Thank you very much !

LaviLiu commented 6 years ago

@Shahnawazgrewal When you train you model on VGGFace2,did you prefilter the dataset ?

Shahnawazgrewal commented 6 years ago

No. It is pretty clean dataset.

thuoctran commented 6 years ago

@yipsang I tried to run a train_tripletloss.py for training the VGG dataset, but the program is crashed when it was saving a checkpoint model. How can you train a VGG dataset by using triplet loss? Is there any changes should I make?

LaviLiu commented 6 years ago

@thuoctran I meet the same problem. You should modify train_tripletloss.py line 175 saver.restore(sess,os.path.expanduser(args.pretrained_model)) to these``` ckpt = tf.train.get_checkpoint_state(os.path.expanduser(args.pretrained_model)) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path)

LaviLiu commented 6 years ago

@Shahnawazgrewal Did not you modify the parameter center_loss_factor,when you train your model? I find that the default value is 0.0. Did you used the value 0.0 train your model ?

Shahnawazgrewal commented 6 years ago

I used 1e-2. @Laviyy

LaviLiu commented 6 years ago

Thank you.

LaviLiu commented 6 years ago

@JianbangZ You trained model with the combined dataset of VGGFace2 and MS Cele. I want to know how to combined the two datasets.? Which algorithm did you use, when you train your model ? Triplet loss, softmax loss, center loss or others? Can you share your model with us? Thank any way!

LaviLiu commented 6 years ago

@Shahnawazgrewal First of all, thank you for your help. I trained my model according your direction, but it not good. I want to know the value of margin when you crop image with mtcnn. I find some faces are still slant. Did you used affine transformation to rotate the faces to upright faces?

rain2008204 commented 6 years ago

@Laviyy margin is 30 ,this is good

LaviLiu commented 6 years ago

@rain2008204 ok, thank you!

davidsandberg / facenet

A new model trained on VGGFace2 #624