Train and Fine-Tuning LightCNN

TheusStremens commented 7 years ago

First, congratulations and thank you for your work, it's very exciting to see that's possible to make a light CNN without millions (or billions) of parameters and achieve state-of-art accuracy.

I intend to do two experiments (varying type of activations, cost functions, solver types, neurons, ...) using the model C architecture, one training a new CNN on my database and another with fine-tuning of model C on my database. I made the following solver.prototxt and train_value.prototxt:

solver:

net: "LightenedCNN_New_train_val.prototxt"
test_iter: 1000
test_interval: 10000
iter_size: 60
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 500000
display: 100
max_iter: 5000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "LightenedCNN_New_Net"
solver_mode: GPU

train_val

layer {
  name: "data"
  type:"Data"
  top: "data"
  top: "label"
  data_param{
      source: "my_csv_train_database.txt"
      batch_size: 32
    }
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: true
  }
  include: { phase: TRAIN }
}

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  data_param{
      source: "my_csv_validation_database.txt"
      batch_size: 10
    }
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: false
  }
  include: { phase: TEST }
}
...
the same layers on deploy

Could you tell me if this solver and train_val are similar to those you used on the final training of model C? And for fine-tuning can I use the same solver used in train and just freeze the layers on the train_val or it's necessary another solver for fine-tuning?

Thanks

AlfredXiangWu commented 7 years ago

The configurations of solver and train_val are right for training light CNN and it is also suitable for fine-tuning on your own dataset.

jiangxuehan commented 7 years ago

As the description in your paper , “The learning rate is set to 1e-3 initially and reduced to 5e-5 gradually”.Could you please tell me the specific related parameters to achieve this point in Caffe , such as lr_policy\gamma\stepsize\max_iter etc. Thanks.

TheusStremens commented 7 years ago

@jiangxuehan I believe that isn't a right answer to your question, the learning rate decay depends of your training database. In general, we reduce the learning rate when the training cost don't decrease after some iterations. So, the "best" way is to take a look in your training cost and reduce the learning rate after X iterations. In Caffe you can specify the number of steps for decrease the learning rate in the solver, for example:

lr_policy: "multistep"
gamma: 0.9
stepvalue: 5000
stepvalue: 7000
stepvalue: 8000
stepvalue: 9000
stepvalue: 9500

The multistep policy makes the learning rate reduce for gamma in each stepvalue (that you can find looking at training cost). But if you just want follow the paper, it's easy to calculate the gamma and use step policy. For example, take a look at my solver, my stepsize is 500000 and max_iter is 5000000. This means that (in worse case) my learning rate will drop ten times. So, with base_lr = 0.001, after ten drops should be 0.00005. Calculating: _baselr * (gamma)^10 = 5e-5 we get gamma ~= 0.933. So, the solver becomes:

lr_policy: "step"
gamma: 0.933
stepsize: 500000

If you won't do all the 5000000 iterations just adjust the equation.

TheusStremens commented 7 years ago

@AlfredXiangWu I'm training using a Tesla K40 and at Iteration 1100 my training is loss = 11.3229. That's taking so long, is this normal? I normalized 5M images of MS-CELEB (clean list) using the paper's specification and used the solver of this issue.

AlfredXiangWu commented 7 years ago

@TheusStremens I think it is normal for training the light CNN.

@jiangxuehan You can follow the configurations as @TheusStremens mentioned. It is similar as my configurations.

jiangxuehan commented 7 years ago

@TheusStremens @AlfredXiangWu ,Thanks for your reply, i will follow similar configurations to train this model. BTW, the loss of light CNN drops slowly at the begin several thousands iterations, @TheusStremens
just keep training is OK.

TheusStremens commented 7 years ago

Hi guys, after 7 days of training, the cost just barely oscillated and it's already at iteration 20K. Following this proportion, it will be in iteration 100K in 5 weeks and iteration 1M (1/5 of the max iteration number) in a year. @AlfredXiangWu is this normal? How long did your training take? Can you tell me the number of iterations in the end of your training? ps: I'm training on a Tesla K40

AlfredXiangWu commented 7 years ago

@TheusStremens Do you mean that you train the light CNN for about a week and the iterations are only 20k?

It is abnormal. I set max iteration to 4,000,000 and it takes about 1 week on Titan X.

TheusStremens commented 7 years ago

I remove iter_size: 60 from the solver and the speed grows up. But now I have a problem with convergence like https://github.com/AlfredXiangWu/face_verification_experiment/issues/36 my loss is 87.3365 at the beginning. Changing the batch_size to 80 apparently resolved the problem with the convergence, but the speed still abnormal (1/4 of your speed). Did you use iter_size in your training @AlfredXiangWu ? I'll try different sets of batch_size

TheusStremens commented 7 years ago

The convergence problem doesn't change. Just happeed at iteration 8980. I'm using the normalization correctly, the same base_lr, same architecture, so I can't figure out what is the problem of convergence.

AlfredXiangWu commented 7 years ago

net: "DeepFace_set003_train_test.prototxt"

test_iter: 500 test_interval: 1000 test_compute_loss: true

base_lr: 0.001 momentum: 0.9 weight_decay: 0.0005 lr_policy: "step" stepsize:500000 gamma:0.457305051927326

display: 100 max_iter: 4000000 snapshot: 40000 snapshot_prefix: "DeepFace_set003_net"

solver_mode: GPU

debug_info: false

clip_gradients: 150

The solver I used for training is above. Clipping gradient may help to solve your problems. If not, I think you can finetune the light CNN with your own datasets by the pre-trained model.

lei-xiong commented 7 years ago

@AlfredXiangWu @TheusStremens I tried to train with MS-Celeb-1M and model C until 20 million iterations. Loss is always at 11.0, I use 61332 class altogether 390,000 pictures and batchsize = 96x4. Is this normal? How many times did you drift begin to drop significantly? Thank you

I tried to lower my learning rate

TheusStremens commented 7 years ago

@xionglei181818 Did you use the clean list of MS-Celeb-1M? Why did you use only 390,000 pictures if MS-Celeb-1M have 5M+? What learning rate did you use?

In my case, shuffle the train data solves the problem with convergence. After 700K iterations the loss dropped to 3. Now I'm at 1,8M iterations, loss = 1 and acc = 89%

lei-xiong commented 7 years ago

@TheusStremens I use the clean list of MS-Celeb-1M and took 50 images of each category, so that after screening to get 61332 categories, about 390,000 images.

1、Using the learning rate provided by @AlfredXiangWu。 base_lr: 0.001 momentum: 0.9 weight_decay: 0.0005 lr_policy: "step" stepsize:200000 gamma: 0.457305051927326

2、Also try to use a set of parameters is base_lr: 0.001 momentum: 0.9 weight_decay: 0.0005 lr_policy: "inv" gamma: 0.000005 power:0.75

These two sets of parameters under the run, run 200,000 iteration loss has been around 11.0. Reduce the learning rate to 0.0001 as well. Have you observed this phenomenon? Thank you

AlfredXiangWu commented 7 years ago

@xionglei181818 I recommend that the policy of learning rate is set to "fixed" or "step" rather than "inv" .

lyuchuny3 commented 7 years ago

I have trained on 1M_Celeb_MS with solver config provided by@AlfredXiangWu. It tooks 9 days on TitanX for 3,500,000 iters. The performance of my model on LFW is not as well as model C. My test results: model C: DIR= 0.835 @ FAR=1% on LFW my model: DIR= 0.641 @ FAR=1% on LFW I wonder the reasons are:

train images: I directly crop the aligned image of 1M-Celeb-MS without alignment. I note that: Dataset size ec_mc_y ec_y Training set 144x144 48 48 Testing set 128x128 48 40
train batch: in my training, I set the batch for train is 124 (but I think this is not the main reason)
in proto of model C, I add param for weight decay for 'fc2' as mentioned in the paper param{ lr_mult:1 decay_mult:10 } @AlfredXiangWu , do you have some advice?

ctgushiwei commented 7 years ago

@AlfredXiangWu @TheusStremens @lyuchuny3 can you share you train_tese_prototxt and your solver.prototxt? I'm training light cnn with the clean list, after screening to get 79056 categories, about 4,920,000 images.But run 400,000 iteration loss is also 11.2?can you give me a hand?

yuzcccc commented 7 years ago

how many iterations (what batch-size) are needed to achieve the results of model B trained on the CASIA-webface dataset?

TheusStremens commented 7 years ago

@ctgushiwei solver:

net: "/path_to_your_train_val_net/your_net_train_val.prototxt"
test_iter: 1000
test_interval: 10000
test_compute_loss: true
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 500000
display: 10
max_iter: 4000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "Snapshot_your_net"
solver_mode: GPU
debug_info: false

your_net_train_val.prototxt:

name: "Your_Name_Net"

layer {
  name: "data"
  type:"ImageData"
  top: "data"
  top: "label"
  image_data_param{
      source: "/your_path/train_csv.txt"
      batch_size: 50
      shuffle: true
    }
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: true

  }
  include: { phase: TRAIN }
}

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  image_data_param{
      source: "/your_path/validation_csv.txt"
      batch_size: 10
    }
  transform_param {
    scale: 0.00390625
    crop_size: 128
    mirror: false
  }
  include: { phase: TEST }
}

layer{
  name: "conv1"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 5
    stride: 1
    pad: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
  bottom: "data"
  top: "conv1"
}

layer{
  name: "slice1"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv1"
  top: "slice1_1"
  top: "slice1_2"
}
layer{
  name: "etlwise1"
  type: "Eltwise"
  bottom: "slice1_1"
  bottom: "slice1_2"
  top: "eltwise1"
  eltwise_param {
    operation: MAX
  }
}
layer{
  name: "pool1"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
  bottom: "eltwise1"
  top: "pool1"
}

layer{
  name: "conv2a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
  bottom: "pool1"
  top: "conv2a"
}
layer{
  name: "slice2a"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv2a"
  top: "slice2a_1"
  top: "slice2a_2"
}
layer{
  name: "etlwise2a"
  type: "Eltwise"
  bottom: "slice2a_1"
  bottom: "slice2a_2"
  top: "eltwise2a"
  eltwise_param {
    operation: MAX
  }
}

layer{
  name: "conv2"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 192
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
  bottom: "eltwise2a"
  top: "conv2"
}

layer{
  name: "slice2"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv2"
  top: "slice2_1"
  top: "slice2_2"
}
layer{
  name: "etlwise2"
  type: "Eltwise"
  bottom: "slice2_1"
  bottom: "slice2_2"
  top: "eltwise2"
  eltwise_param {
    operation: MAX
  }
}
layer{
  name: "pool2"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
  bottom: "eltwise2"
  top: "pool2"
}

layer{
  name: "conv3a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 192
    kernel_size: 1
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
  bottom: "pool2"
  top: "conv3a"
}
layer{
  name: "slice3a"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv3a"
  top: "slice3a_1"
  top: "slice3a_2"
}
layer{
  name: "etlwise3a"
  type: "Eltwise"
  bottom: "slice3a_1"
  bottom: "slice3a_2"
  top: "eltwise3a"
  eltwise_param {
    operation: MAX
  }
}

layer{
  name: "conv3"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
  bottom: "eltwise3a"
  top: "conv3"
}

layer{
  name: "slice3"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv3"
  top: "slice3_1"
  top: "slice3_2"
}
layer{
  name: "etlwise3"
  type: "Eltwise"
  bottom: "slice3_1"
  bottom: "slice3_2"
  top: "eltwise3"
  eltwise_param {
    operation: MAX
  }
}
layer{
  name: "pool3"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
  bottom: "eltwise3"
  top: "pool3"
}

layer{
  name: "conv4a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 384
    kernel_size: 1
    stride: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "pool3"
  top: "conv4a"
}
layer{
  name: "slice4a"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv4a"
  top: "slice4a_1"
  top: "slice4a_2"
}
layer{
  name: "etlwise4a"
  type: "Eltwise"
  bottom: "slice4a_1"
  bottom: "slice4a_2"
  top: "eltwise4a"
  eltwise_param {
    operation: MAX
  }
}
layer{
  name: "conv4"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 256
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "eltwise4a"
  top: "conv4"
}

layer{
  name: "slice4"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv4"
  top: "slice4_1"
  top: "slice4_2"
}
layer{
  name: "etlwise4"
  type: "Eltwise"
  bottom: "slice4_1"
  bottom: "slice4_2"
  top: "eltwise4"
  eltwise_param {
    operation: MAX
  }
}

layer{
  name: "conv5a"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 256
    kernel_size: 1
    stride: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "eltwise4"
  top: "conv5a"
}
layer{
  name: "slice5a"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv5a"
  top: "slice5a_1"
  top: "slice5a_2"
}
layer{
  name: "etlwise5a"
  type: "Eltwise"
  bottom: "slice5a_1"
  bottom: "slice5a_2"
  top: "eltwise5a"
  eltwise_param {
    operation: MAX
  }
}
layer{
  name: "conv5"
  type: "Convolution"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param{
    num_output: 256
    kernel_size: 3
    stride: 1
    pad: 1
    weight_filler{
      type:"xavier"
    }
    bias_filler{
      type: "constant"
      value: 0.1    
    }
  }
  bottom: "eltwise5a"
  top: "conv5"
}

layer{
  name: "slice5"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "conv5"
  top: "slice5_1"
  top: "slice5_2"
}
layer{
  name: "etlwise5"
  type: "Eltwise"
  bottom: "slice5_1"
  bottom: "slice5_2"
  top: "eltwise5"
  eltwise_param {
    operation: MAX
  }
}

layer{
  name: "pool4"
  type: "Pooling"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
  bottom: "eltwise5"
  top: "pool4"
}

layer{
  name: "fc1"
  type: "InnerProduct"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }   
  }  
  bottom: "pool4"
  top: "fc1"
}
layer{
  name: "slice_fc1"
  type:"Slice"
  slice_param {
    slice_dim: 1
  }
  bottom: "fc1"
  top: "slice_fc1_1"
  top: "slice_fc1_2"
}
layer{
  name: "etlwise_fc1"
  type: "Eltwise"
  bottom: "slice_fc1_1"
  bottom: "slice_fc1_2"
  top: "eltwise_fc1"
  eltwise_param {
    operation: MAX
  }
}

layer{
  name: "drop1"
  type: "Dropout"
  dropout_param{
    dropout_ratio: 0.7
  }
  bottom: "eltwise_fc1"
  top: "eltwise_fc1"
}

layer{
  name: "fc2"
  type: "InnerProduct"

  inner_product_param{
    num_output: 79010
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }   
  }
  bottom: "eltwise_fc1"
  top: "fc2"
}

layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc2"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}

layer {
  name: "softmaxloss"
  type: "SoftmaxWithLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
}

remember to change the num_output value in fc2

ctgushiwei commented 7 years ago

@TheusStremens firstly,thank you very much for your answer! I have other two questions : 1. I use the same train_test_prototxt as your configurations,but after 500K iterations,the loss is still at 11.2,then I change the fc2 layer param as follow: my fc2 layer add the param{ lr_mult:10 decay_mult:1 } param { lr_mult:20 decay_mult:0 } and then the loss begin to drop, as your configurations,how many iterations,the loss begin to drop?

2.do you test your model on LFW and the accuracy can achieve 98%?

TheusStremens commented 7 years ago

@ctgushiwei 1) In my case, at iteration 700K the drop was 2. The drop begin to drop only after I change the batch size and allow shuffle the train data. 2) I'm still training. My training is take four times longer then mr Wu, and the electricity went off in my lab a few times. Besides that I have to suspend the training for train another urgent work. When it's over I notice you the results on LFW.

honghuCode commented 6 years ago

@TheusStremens,hello, when I fine-tuning the lightcnn,I met the error of "Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 96 1 5 5 (2400); target param shape is 96 3 5 5 (7200). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer".could you please help me?

TheusStremens commented 6 years ago

@honghuCode check if you are loading rgb images, LightCNN works with grayscale images

honghuCode commented 6 years ago

@TheusStremens I used the following code to gray the image and resize it to 128 * 128. ` mat = cv2.imread(imgPath,1)

mat=cv2.resize(mat,(128,128))

im_gray = cv2.cvtColor(mat, cv2.COLOR_BGR2GRAY) `

the following is my _train_testbak.prototxt

` name: "DeepFace_set003_net"

layer { name: "data" type:"ImageData" top: "data" top: "label" image_data_param{ source: "/home/honghu/code/caffe-master/lightCNNFace/train.txt" batch_size: 20 shuffle: true } transform_param { scale: 0.00390625 crop_size: 128 mirror: true

} include: { phase: TRAIN } }

layer { name: "data" type: "ImageData" top: "data" top: "label" image_data_param{ source: "/home/honghu/code/caffe-master/lightCNNFace/val.txt" batch_size: 20 } transform_param { scale: 0.00390625 crop_size: 128 mirror: false } include: { phase: TEST } }

layer{ name: "conv1" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 5 stride: 1 pad: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "data" top: "conv1" }

layer{ name: "slice1" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv1" top: "slice1_1" top: "slice1_2" } layer{ name: "etlwise1" type: "Eltwise" bottom: "slice1_1" bottom: "slice1_2" top: "eltwise1" eltwise_param { operation: MAX } } layer{ name: "pool1" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise1" top: "pool1" }

layer{ name: "conv2a" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 1 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "pool1" top: "conv2a" } layer{ name: "slice2a" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv2a" top: "slice2a_1" top: "slice2a_2" } layer{ name: "etlwise2a" type: "Eltwise" bottom: "slice2a_1" bottom: "slice2a_2" top: "eltwise2a" eltwise_param { operation: MAX } }

layer{ name: "conv2" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 192 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "eltwise2a" top: "conv2" }

layer{ name: "slice2" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv2" top: "slice2_1" top: "slice2_2" } layer{ name: "etlwise2" type: "Eltwise" bottom: "slice2_1" bottom: "slice2_2" top: "eltwise2" eltwise_param { operation: MAX } } layer{ name: "pool2" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise2" top: "pool2" }

layer{ name: "conv3a" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 192 kernel_size: 1 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "pool2" top: "conv3a" } layer{ name: "slice3a" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv3a" top: "slice3a_1" top: "slice3a_2" } layer{ name: "etlwise3a" type: "Eltwise" bottom: "slice3a_1" bottom: "slice3a_2" top: "eltwise3a" eltwise_param { operation: MAX } }

layer{ name: "conv3" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "eltwise3a" top: "conv3" }

layer{ name: "slice3" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv3" top: "slice3_1" top: "slice3_2" } layer{ name: "etlwise3" type: "Eltwise" bottom: "slice3_1" bottom: "slice3_2" top: "eltwise3" eltwise_param { operation: MAX } } layer{ name: "pool3" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise3" top: "pool3" }

layer{ name: "conv4a" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param{ num_output: 384 kernel_size: 1 stride: 1 weight_filler{ type:"xavier" } bias_filler{ type: "constant" value: 0.1
} } bottom: "pool3" top: "conv4a" } layer{ name: "slice4a" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv4a" top: "slice4a_1" top: "slice4a_2" } layer{ name: "etlwise4a" type: "Eltwise" bottom: "slice4a_1" bottom: "slice4a_2" top: "eltwise4a" eltwise_param { operation: MAX } } layer{ name: "conv4" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param{ num_output: 256 kernel_size: 3 stride: 1 pad: 1 weight_filler{ type:"xavier" } bias_filler{ type: "constant" value: 0.1
} } bottom: "eltwise4a" top: "conv4" }

layer{ name: "slice4" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv4" top: "slice4_1" top: "slice4_2" } layer{ name: "etlwise4" type: "Eltwise" bottom: "slice4_1" bottom: "slice4_2" top: "eltwise4" eltwise_param { operation: MAX } }

layer{ name: "conv5a" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param{ num_output: 256 kernel_size: 1 stride: 1 weight_filler{ type:"xavier" } bias_filler{ type: "constant" value: 0.1
} } bottom: "eltwise4" top: "conv5a" } layer{ name: "slice5a" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv5a" top: "slice5a_1" top: "slice5a_2" } layer{ name: "etlwise5a" type: "Eltwise" bottom: "slice5a_1" bottom: "slice5a_2" top: "eltwise5a" eltwise_param { operation: MAX } } layer{ name: "conv5" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param{ num_output: 256 kernel_size: 3 stride: 1 pad: 1 weight_filler{ type:"xavier" } bias_filler{ type: "constant" value: 0.1
} } bottom: "eltwise5a" top: "conv5" }

layer{ name: "slice5" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv5" top: "slice5_1" top: "slice5_2" } layer{ name: "etlwise5" type: "Eltwise" bottom: "slice5_1" bottom: "slice5_2" top: "eltwise5" eltwise_param { operation: MAX } }

layer{ name: "pool4" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise5" top: "pool4" }

layer{ name: "fc1" type: "InnerProduct" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 512 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 }
}
bottom: "pool4" top: "fc1" } layer{ name: "slice_fc1" type:"Slice" slice_param { slice_dim: 1 } bottom: "fc1" top: "slice_fc1_1" top: "slice_fc1_2" } layer{ name: "etlwise_fc1" type: "Eltwise" bottom: "slice_fc1_1" bottom: "slice_fc1_2" top: "eltwise_fc1" eltwise_param { operation: MAX } }

layer{ name: "drop1" type: "Dropout" dropout_param{ dropout_ratio: 0.7 } bottom: "eltwise_fc1" top: "eltwise_fc1" }

layer{ name: "fnc2" type: "InnerProduct"

inner_product_param{ num_output: 50 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 }
} bottom: "eltwise_fc1" top: "fnc2" }

layer { name: "accuracy" type: "Accuracy" bottom: "fnc2" bottom: "label" top: "accuracy" include: { phase: TEST } }

layer { name: "softmaxloss" type: "SoftmaxWithLoss" bottom: "fnc2" bottom: "label" top: "loss" } `

TheusStremens commented 6 years ago

@honghuCode add is_color: false in the data layer. Caffe loads images in 3 channels even if they are in grayscale unless you set this parameter to true.

honghuCode commented 6 years ago

@TheusStremens thank you very much,you solved my problems.

`

layer { name: "data" type: "ImageData" top: "data" top: "label" image_data_param{ source: "/home/code/caffe-master/lightCNNFace/val.txt" batch_size: 20 _iscolor:false } transform_param { scale: 0.00390625 crop_size: 128 mirror: false } include: { phase: TEST } } `

AlfredXiangWu / face_verification_experiment

Train and Fine-Tuning LightCNN #110