Closed TheusStremens closed 7 years ago
The configurations of solver and train_val are right for training light CNN and it is also suitable for fine-tuning on your own dataset.
As the description in your paper , “The learning rate is set to 1e-3 initially and reduced to 5e-5 gradually”.Could you please tell me the specific related parameters to achieve this point in Caffe , such as lr_policy\gamma\stepsize\max_iter etc. Thanks.
@jiangxuehan I believe that isn't a right answer to your question, the learning rate decay depends of your training database. In general, we reduce the learning rate when the training cost don't decrease after some iterations. So, the "best" way is to take a look in your training cost and reduce the learning rate after X iterations. In Caffe you can specify the number of steps for decrease the learning rate in the solver, for example:
lr_policy: "multistep"
gamma: 0.9
stepvalue: 5000
stepvalue: 7000
stepvalue: 8000
stepvalue: 9000
stepvalue: 9500
The multistep policy makes the learning rate reduce for gamma in each stepvalue (that you can find looking at training cost). But if you just want follow the paper, it's easy to calculate the gamma and use step policy. For example, take a look at my solver, my stepsize is 500000 and max_iter is 5000000. This means that (in worse case) my learning rate will drop ten times. So, with base_lr = 0.001, after ten drops should be 0.00005. Calculating: _baselr * (gamma)^10 = 5e-5 we get gamma ~= 0.933. So, the solver becomes:
lr_policy: "step"
gamma: 0.933
stepsize: 500000
If you won't do all the 5000000 iterations just adjust the equation.
@AlfredXiangWu I'm training using a Tesla K40 and at Iteration 1100 my training is loss = 11.3229. That's taking so long, is this normal? I normalized 5M images of MS-CELEB (clean list) using the paper's specification and used the solver of this issue.
@TheusStremens I think it is normal for training the light CNN.
@jiangxuehan You can follow the configurations as @TheusStremens mentioned. It is similar as my configurations.
@TheusStremens @AlfredXiangWu ,Thanks for your reply, i will follow similar configurations to train this model. BTW, the loss of light CNN drops slowly at the begin several thousands iterations, @TheusStremens
just keep training is OK.
Hi guys, after 7 days of training, the cost just barely oscillated and it's already at iteration 20K. Following this proportion, it will be in iteration 100K in 5 weeks and iteration 1M (1/5 of the max iteration number) in a year. @AlfredXiangWu is this normal? How long did your training take? Can you tell me the number of iterations in the end of your training? ps: I'm training on a Tesla K40
@TheusStremens Do you mean that you train the light CNN for about a week and the iterations are only 20k?
It is abnormal. I set max iteration to 4,000,000 and it takes about 1 week on Titan X.
I remove iter_size: 60 from the solver and the speed grows up. But now I have a problem with convergence like https://github.com/AlfredXiangWu/face_verification_experiment/issues/36 my loss is 87.3365 at the beginning. Changing the batch_size to 80 apparently resolved the problem with the convergence, but the speed still abnormal (1/4 of your speed). Did you use iter_size in your training @AlfredXiangWu ? I'll try different sets of batch_size
The convergence problem doesn't change. Just happeed at iteration 8980. I'm using the normalization correctly, the same base_lr, same architecture, so I can't figure out what is the problem of convergence.
net: "DeepFace_set003_train_test.prototxt"
test_iter: 500 test_interval: 1000 test_compute_loss: true
base_lr: 0.001 momentum: 0.9 weight_decay: 0.0005 lr_policy: "step" stepsize:500000 gamma:0.457305051927326
display: 100 max_iter: 4000000 snapshot: 40000 snapshot_prefix: "DeepFace_set003_net"
solver_mode: GPU
debug_info: false
clip_gradients: 150
The solver I used for training is above. Clipping gradient may help to solve your problems. If not, I think you can finetune the light CNN with your own datasets by the pre-trained model.
@AlfredXiangWu @TheusStremens I tried to train with MS-Celeb-1M and model C until 20 million iterations. Loss is always at 11.0, I use 61332 class altogether 390,000 pictures and batchsize = 96x4. Is this normal? How many times did you drift begin to drop significantly? Thank you
I tried to lower my learning rate
@xionglei181818 Did you use the clean list of MS-Celeb-1M? Why did you use only 390,000 pictures if MS-Celeb-1M have 5M+? What learning rate did you use?
In my case, shuffle the train data solves the problem with convergence. After 700K iterations the loss dropped to 3. Now I'm at 1,8M iterations, loss = 1 and acc = 89%
@TheusStremens I use the clean list of MS-Celeb-1M and took 50 images of each category, so that after screening to get 61332 categories, about 390,000 images.
1、Using the learning rate provided by @AlfredXiangWu。 base_lr: 0.001 momentum: 0.9 weight_decay: 0.0005 lr_policy: "step" stepsize:200000 gamma: 0.457305051927326
2、Also try to use a set of parameters is base_lr: 0.001 momentum: 0.9 weight_decay: 0.0005 lr_policy: "inv" gamma: 0.000005 power:0.75
These two sets of parameters under the run, run 200,000 iteration loss has been around 11.0. Reduce the learning rate to 0.0001 as well. Have you observed this phenomenon? Thank you
@xionglei181818 I recommend that the policy of learning rate is set to "fixed" or "step" rather than "inv" .
I have trained on 1M_Celeb_MS with solver config provided by@AlfredXiangWu. It tooks 9 days on TitanX for 3,500,000 iters. The performance of my model on LFW is not as well as model C. My test results: model C: DIR= 0.835 @ FAR=1% on LFW my model: DIR= 0.641 @ FAR=1% on LFW I wonder the reasons are:
@AlfredXiangWu @TheusStremens @lyuchuny3 can you share you train_tese_prototxt and your solver.prototxt? I'm training light cnn with the clean list, after screening to get 79056 categories, about 4,920,000 images.But run 400,000 iteration loss is also 11.2?can you give me a hand?
how many iterations (what batch-size) are needed to achieve the results of model B trained on the CASIA-webface dataset?
@ctgushiwei solver:
net: "/path_to_your_train_val_net/your_net_train_val.prototxt"
test_iter: 1000
test_interval: 10000
test_compute_loss: true
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 500000
display: 10
max_iter: 4000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "Snapshot_your_net"
solver_mode: GPU
debug_info: false
your_net_train_val.prototxt:
name: "Your_Name_Net"
layer {
name: "data"
type:"ImageData"
top: "data"
top: "label"
image_data_param{
source: "/your_path/train_csv.txt"
batch_size: 50
shuffle: true
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: true
}
include: { phase: TRAIN }
}
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
image_data_param{
source: "/your_path/validation_csv.txt"
batch_size: 10
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: false
}
include: { phase: TEST }
}
layer{
name: "conv1"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 5
stride: 1
pad: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "data"
top: "conv1"
}
layer{
name: "slice1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv1"
top: "slice1_1"
top: "slice1_2"
}
layer{
name: "etlwise1"
type: "Eltwise"
bottom: "slice1_1"
bottom: "slice1_2"
top: "eltwise1"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool1"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise1"
top: "pool1"
}
layer{
name: "conv2a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool1"
top: "conv2a"
}
layer{
name: "slice2a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv2a"
top: "slice2a_1"
top: "slice2a_2"
}
layer{
name: "etlwise2a"
type: "Eltwise"
bottom: "slice2a_1"
bottom: "slice2a_2"
top: "eltwise2a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv2"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise2a"
top: "conv2"
}
layer{
name: "slice2"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv2"
top: "slice2_1"
top: "slice2_2"
}
layer{
name: "etlwise2"
type: "Eltwise"
bottom: "slice2_1"
bottom: "slice2_2"
top: "eltwise2"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool2"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise2"
top: "pool2"
}
layer{
name: "conv3a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool2"
top: "conv3a"
}
layer{
name: "slice3a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv3a"
top: "slice3a_1"
top: "slice3a_2"
}
layer{
name: "etlwise3a"
type: "Eltwise"
bottom: "slice3a_1"
bottom: "slice3a_2"
top: "eltwise3a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv3"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise3a"
top: "conv3"
}
layer{
name: "slice3"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv3"
top: "slice3_1"
top: "slice3_2"
}
layer{
name: "etlwise3"
type: "Eltwise"
bottom: "slice3_1"
bottom: "slice3_2"
top: "eltwise3"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool3"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise3"
top: "pool3"
}
layer{
name: "conv4a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 384
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "pool3"
top: "conv4a"
}
layer{
name: "slice4a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4a"
top: "slice4a_1"
top: "slice4a_2"
}
layer{
name: "etlwise4a"
type: "Eltwise"
bottom: "slice4a_1"
bottom: "slice4a_2"
top: "eltwise4a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv4"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4a"
top: "conv4"
}
layer{
name: "slice4"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4"
top: "slice4_1"
top: "slice4_2"
}
layer{
name: "etlwise4"
type: "Eltwise"
bottom: "slice4_1"
bottom: "slice4_2"
top: "eltwise4"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv5a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4"
top: "conv5a"
}
layer{
name: "slice5a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5a"
top: "slice5a_1"
top: "slice5a_2"
}
layer{
name: "etlwise5a"
type: "Eltwise"
bottom: "slice5a_1"
bottom: "slice5a_2"
top: "eltwise5a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv5"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise5a"
top: "conv5"
}
layer{
name: "slice5"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5"
top: "slice5_1"
top: "slice5_2"
}
layer{
name: "etlwise5"
type: "Eltwise"
bottom: "slice5_1"
bottom: "slice5_2"
top: "eltwise5"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool4"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise5"
top: "pool4"
}
layer{
name: "fc1"
type: "InnerProduct"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 512
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool4"
top: "fc1"
}
layer{
name: "slice_fc1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "fc1"
top: "slice_fc1_1"
top: "slice_fc1_2"
}
layer{
name: "etlwise_fc1"
type: "Eltwise"
bottom: "slice_fc1_1"
bottom: "slice_fc1_2"
top: "eltwise_fc1"
eltwise_param {
operation: MAX
}
}
layer{
name: "drop1"
type: "Dropout"
dropout_param{
dropout_ratio: 0.7
}
bottom: "eltwise_fc1"
top: "eltwise_fc1"
}
layer{
name: "fc2"
type: "InnerProduct"
inner_product_param{
num_output: 79010
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise_fc1"
top: "fc2"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc2"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layer {
name: "softmaxloss"
type: "SoftmaxWithLoss"
bottom: "fc2"
bottom: "label"
top: "loss"
}
remember to change the num_output value in fc2
@TheusStremens firstly,thank you very much for your answer! I have other two questions : 1. I use the same train_test_prototxt as your configurations,but after 500K iterations,the loss is still at 11.2,then I change the fc2 layer param as follow: my fc2 layer add the param{ lr_mult:10 decay_mult:1 } param { lr_mult:20 decay_mult:0 } and then the loss begin to drop, as your configurations,how many iterations,the loss begin to drop?
2.do you test your model on LFW and the accuracy can achieve 98%?
@ctgushiwei 1) In my case, at iteration 700K the drop was 2. The drop begin to drop only after I change the batch size and allow shuffle the train data. 2) I'm still training. My training is take four times longer then mr Wu, and the electricity went off in my lab a few times. Besides that I have to suspend the training for train another urgent work. When it's over I notice you the results on LFW.
@TheusStremens,hello, when I fine-tuning the lightcnn,I met the error of "Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 96 1 5 5 (2400); target param shape is 96 3 5 5 (7200). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer".could you please help me?
@honghuCode check if you are loading rgb images, LightCNN works with grayscale images
@TheusStremens I used the following code to gray the image and resize it to 128 * 128. ` mat = cv2.imread(imgPath,1)
mat=cv2.resize(mat,(128,128))
im_gray = cv2.cvtColor(mat, cv2.COLOR_BGR2GRAY) `
the following is my _train_testbak.prototxt
` name: "DeepFace_set003_net"
layer { name: "data" type:"ImageData" top: "data" top: "label" image_data_param{ source: "/home/honghu/code/caffe-master/lightCNNFace/train.txt" batch_size: 20 shuffle: true } transform_param { scale: 0.00390625 crop_size: 128 mirror: true
} include: { phase: TRAIN } }
layer { name: "data" type: "ImageData" top: "data" top: "label" image_data_param{ source: "/home/honghu/code/caffe-master/lightCNNFace/val.txt" batch_size: 20 } transform_param { scale: 0.00390625 crop_size: 128 mirror: false } include: { phase: TEST } }
layer{ name: "conv1" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 5 stride: 1 pad: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "data" top: "conv1" }
layer{ name: "slice1" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv1" top: "slice1_1" top: "slice1_2" } layer{ name: "etlwise1" type: "Eltwise" bottom: "slice1_1" bottom: "slice1_2" top: "eltwise1" eltwise_param { operation: MAX } } layer{ name: "pool1" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise1" top: "pool1" }
layer{ name: "conv2a" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 1 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "pool1" top: "conv2a" } layer{ name: "slice2a" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv2a" top: "slice2a_1" top: "slice2a_2" } layer{ name: "etlwise2a" type: "Eltwise" bottom: "slice2a_1" bottom: "slice2a_2" top: "eltwise2a" eltwise_param { operation: MAX } }
layer{ name: "conv2" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 192 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "eltwise2a" top: "conv2" }
layer{ name: "slice2" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv2" top: "slice2_1" top: "slice2_2" } layer{ name: "etlwise2" type: "Eltwise" bottom: "slice2_1" bottom: "slice2_2" top: "eltwise2" eltwise_param { operation: MAX } } layer{ name: "pool2" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise2" top: "pool2" }
layer{ name: "conv3a" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 192 kernel_size: 1 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "pool2" top: "conv3a" } layer{ name: "slice3a" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv3a" top: "slice3a_1" top: "slice3a_2" } layer{ name: "etlwise3a" type: "Eltwise" bottom: "slice3a_1" bottom: "slice3a_2" top: "eltwise3a" eltwise_param { operation: MAX } }
layer{ name: "conv3" type: "Convolution" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.1 } } bottom: "eltwise3a" top: "conv3" }
layer{ name: "slice3" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv3" top: "slice3_1" top: "slice3_2" } layer{ name: "etlwise3" type: "Eltwise" bottom: "slice3_1" bottom: "slice3_2" top: "eltwise3" eltwise_param { operation: MAX } } layer{ name: "pool3" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise3" top: "pool3" }
layer{
name: "conv4a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 384
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "pool3"
top: "conv4a"
}
layer{
name: "slice4a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4a"
top: "slice4a_1"
top: "slice4a_2"
}
layer{
name: "etlwise4a"
type: "Eltwise"
bottom: "slice4a_1"
bottom: "slice4a_2"
top: "eltwise4a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv4"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4a"
top: "conv4"
}
layer{ name: "slice4" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv4" top: "slice4_1" top: "slice4_2" } layer{ name: "etlwise4" type: "Eltwise" bottom: "slice4_1" bottom: "slice4_2" top: "eltwise4" eltwise_param { operation: MAX } }
layer{
name: "conv5a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4"
top: "conv5a"
}
layer{
name: "slice5a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5a"
top: "slice5a_1"
top: "slice5a_2"
}
layer{
name: "etlwise5a"
type: "Eltwise"
bottom: "slice5a_1"
bottom: "slice5a_2"
top: "eltwise5a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv5"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise5a"
top: "conv5"
}
layer{ name: "slice5" type:"Slice" slice_param { slice_dim: 1 } bottom: "conv5" top: "slice5_1" top: "slice5_2" } layer{ name: "etlwise5" type: "Eltwise" bottom: "slice5_1" bottom: "slice5_2" top: "eltwise5" eltwise_param { operation: MAX } }
layer{ name: "pool4" type: "Pooling" pooling_param { pool: MAX kernel_size: 2 stride: 2 } bottom: "eltwise5" top: "pool4" }
layer{
name: "fc1"
type: "InnerProduct"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 512
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool4"
top: "fc1"
}
layer{
name: "slice_fc1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "fc1"
top: "slice_fc1_1"
top: "slice_fc1_2"
}
layer{
name: "etlwise_fc1"
type: "Eltwise"
bottom: "slice_fc1_1"
bottom: "slice_fc1_2"
top: "eltwise_fc1"
eltwise_param {
operation: MAX
}
}
layer{ name: "drop1" type: "Dropout" dropout_param{ dropout_ratio: 0.7 } bottom: "eltwise_fc1" top: "eltwise_fc1" }
layer{ name: "fnc2" type: "InnerProduct"
inner_product_param{
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise_fc1"
top: "fnc2"
}
layer { name: "accuracy" type: "Accuracy" bottom: "fnc2" bottom: "label" top: "accuracy" include: { phase: TEST } }
layer { name: "softmaxloss" type: "SoftmaxWithLoss" bottom: "fnc2" bottom: "label" top: "loss" } `
@honghuCode add is_color: false in the data layer. Caffe loads images in 3 channels even if they are in grayscale unless you set this parameter to true.
@TheusStremens thank you very much,you solved my problems.
`
layer { name: "data" type: "ImageData" top: "data" top: "label" image_data_param{ source: "/home/code/caffe-master/lightCNNFace/val.txt" batch_size: 20 _iscolor:false } transform_param { scale: 0.00390625 crop_size: 128 mirror: false } include: { phase: TEST } } `
First, congratulations and thank you for your work, it's very exciting to see that's possible to make a light CNN without millions (or billions) of parameters and achieve state-of-art accuracy.
I intend to do two experiments (varying type of activations, cost functions, solver types, neurons, ...) using the model C architecture, one training a new CNN on my database and another with fine-tuning of model C on my database. I made the following solver.prototxt and train_value.prototxt:
Could you tell me if this solver and train_val are similar to those you used on the final training of model C? And for fine-tuning can I use the same solver used in train and just freeze the layers on the train_val or it's necessary another solver for fine-tuning?
Thanks