AlfredXiangWu / face_verification_experiment

Original Caffe Version for LightCNN-9. Highly recommend to use PyTorch Version (https://github.com/AlfredXiangWu/LightCNN)
725 stars 327 forks source link

Cleaning MS-Celeb DB #89

Closed malysheva closed 7 years ago

malysheva commented 7 years ago

Hello, AlfredXiangWu. I try to use your model C for cleaning MS-Celeb dataset like you wrote in your paper. So my labels are the same as in 'fc2' layer in model C. I try to employ the trained model C to predict the MS-Celeb-1M dataset and obtain the probability p (which is in last layer) for each sample. But in different forward propagation I get different max probability and Moreover I get different class label for same images. I noticed that 'fc1' layer output are the same for same images.

I tried delete dropout layer, I tried to use python and C++.

Can It dependency on significant decimal digits precision? May be is not enough of them? If it's true, can we trust accuracy data, when we use MS-Celeb dataset in training stage?

How do you use your model, when you cleaning this dataset? How can we get right probability?

Thanks a lot.

AlfredXiangWu commented 7 years ago

How do you input face images for the network? Due to the different resolution of original MS-Celeb-1M, if you directly input them and use "crop_size", it may randomly crop one patch from an original face image. I think that may cause the different predictions.

If not, I have no idea about your problem. Could you provide your code?

By the way, the model C is not used for bootstrapping MS-Celeb-1M. Because C model is trained on the cleaned dataset. As is shown in the paper, I firstly train the model on the original MS-Celeb-1M and then use this model to bootstrap the dirty training dataset.

malysheva commented 7 years ago

I downloaded photos. Next I found faces using Face detector, next I found face landmarks, next we normalized and crop each input image into 128 × 128 using this landmarks. For example, input image after this manipulating is in attachement. His size is (128x128).

m 010hn_0-faceid-0

The start of ptototxt file is

name: "DeepFace_set003_net" layer { name: "input" type: "Input" top: "data" input_param { shape { dim: 1 dim: 1 dim: 128 dim: 128 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" ...

_So, I don't use "cropsize".

The end of prototxt file is

First: ... layer { name: "slice_fc1" type: "Slice" bottom: "fc1" top: "slice_fc1_1" top: "slice_fc1_2" slice_param { slice_dim: 1 } } layer { name: "etlwise_fc1" type: "Eltwise" bottom: "slice_fc1_1" bottom: "slice_fc1_2" top: "eltwise_fc1" eltwise_param { operation: MAX } } layer { name: "fc2" type: "InnerProduct" bottom: "eltwise_fc1" top: "fc2" inner_product_param { num_output: 99891 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } }

Second: ... layer { name: "etlwise_fc1" type: "Eltwise" bottom: "slice_fc1_1" bottom: "slice_fc1_2" top: "eltwise_fc1" eltwise_param { operation: MAX } } layer { name: "fc2" type: "InnerProduct" bottom: "eltwise_fc1" top: "fc2" inner_product_param { num_output: 99891 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } } layer { name: "softmax" type: "Softmax" bottom: "fc2" top: "prob" }

Third:

... layer { name: "etlwise_fc1" type: "Eltwise" bottom: "slice_fc1_1" bottom: "slice_fc1_2" top: "eltwise_fc1" eltwise_param { operation: MAX } } layer { name: "drop1" type: "Dropout" bottom: "eltwise_fc1" top: "eltwise_fc1" dropout_param { dropout_ratio: 0.7 } } layer { name: "fc2" type: "InnerProduct" bottom: "eltwise_fc1" top: "fc2" inner_product_param { num_output: 99891 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } } layer { name: "softmax" type: "Softmax" bottom: "fc2" top: "prob" }

My code for getting prediction is:

(In python)

`#! /usr/bin/python

import cv2 import sys

import os import shutil import numpy as np import caffe

caffe.set_mode_gpu() net_pretrained = './LightenedCNN_C.caffemodel'

net_model_file = './LightenedCNN_C_deploy_new.prototxt' net = caffe.Classifier(net_model_file, net_pretrained, mean=None, channel_swap=[0], raw_scale=1, image_dims=(128, 128))

filename = './m.010hn_0-FaceId-0.jpg' img1 = cv2.imread(filename, 0) input_image = img1 / 256.0 input_image = input_image[:,:,np.newaxis] prediction = net.predict([input_image], False) print prediction`

If I use second and third ending of prototxt file I add this for getting max probability

print prediction[0].argmax() estimation = prediction[0][prediction[0].argmax()] print estimation

Note that I don't use 'oversampling' during prediction.

When you firstly train the model on the original MS-Celeb-1M and then use this model to bootstrap the dirty training dataset, did you check that you get same answer if you make it repeatedly on the same data and the same model?

Thanks a lot.

AlfredXiangWu commented 7 years ago

Oh, I am sorry that I have made a mistake. The description of the fully connected layer in train_test.prototxt is as follows:

layer{
  name: "fc2_ms"
  type: "InnerProduct"

  param {
    lr_mult: 10
    decay_mult: 1
  }
  param {
    lr_mult: 20
    decay_mult: 0
  }
  inner_product_param{
    num_output: 99891
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }   
  }
  bottom: "eltwise_fc1"
  top: "fc2_ms"
}

The name of this layer is "fc2_ms" in train_test.prototxt, however, it is "fc2" in the deploy.prototxt. Due to the different name, caffe will initialize this fully connected layer randomly and it leads to the different result for the same image.

rexnxiaobai commented 7 years ago

@AlfredXiangWu in the paper, "the weight decay of fc2 layer to 5e-3 while other layers are 5e-4, also, parameter initialization for convolution is Xavier and Gaussian is used for fully-connected layers"

your fc2_ms layer does not seem to match the description of the paper~

AlfredXiangWu commented 7 years ago

@rexnxiaobai The prototxt is used for bootstrapping. Different from the feature extraction model, the parameters of fc2 is also significant, so the weight decay is also 5e-4. About the parameter initialization, I have checked the train_test.prototxt just now and it is Xavier exactly for fc2. I will fix it in a new version paper.

malysheva commented 7 years ago

AlfredXiangWu thank you.

Layer 'fc2_ms' works fine.