abhimanyudubey / confusion

Code for the ECCV 2018 paper "Pairwise Confusion for Fine-Grained Visual Classification"
https://arxiv.org/abs/1705.08016
200 stars 39 forks source link

[HELP] : The loss_train is always 5.3833 #5

Closed BaofengZan closed 5 years ago

BaofengZan commented 5 years ago
_20181129133917

How can I solve it? Thanks!!

BaofengZan commented 5 years ago

My platform is Ubuntu 16.04, and the script is used to train CAFFE。In addition, some different networks (VGG16 googlenet Alexnet) are used。 However, all results are the same for all experiments。

abhimanyudubey commented 5 years ago

Hi,

Can you tell me more about the problem you are using it for? What parameters for the confusion are you using (lambda, learning rate, model solver etc.), and are you using the version of caffe that's present here: https://github.com/abhimanyudubey/caffe?

Thanks! Abhi

BaofengZan commented 5 years ago

@abhimanyudubey Thank you for your reply。 Firstly, the version of caffe was correct. Then, the parameters was same as ./confusion/caffe/models/cub/bilinear_vgg_restask/solver_all.prototxt, the net was VGG16 and dataset was CUB2011.
The problem I encountered was that during training, the loss value was always 5.3833, the precision was always 0, and it did not change. When I switched to googleNet, it was the same situation.

abhimanyudubey commented 5 years ago

Hi, sorry for the delay in responding. Can you share your prototxt with me?

BaofengZan commented 5 years ago

@abhimanyudubey Thanks. The prototxt is

name: "CUBCompactBilinearNet - All" layer { name: "data" type: "ImageData" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: true crop_size: 448 mean_value: 104.0 mean_value: 117.0 mean_value: 124.0 } image_data_param { source: "/home/baofengzan/111/CUB/CUB2011/CUB2011/_train.txt" batch_size: 2 shuffle: true new_height: 512 new_width: 512 root_folder: "" } } layer { name: "data" type: "ImageData" top: "data" top: "label" include { phase: TEST } transform_param { mirror: false crop_size: 448 mean_value: 104.0 mean_value: 117.0 mean_value: 123.0 } image_data_param { source: "/home/baofengzan/111/CUB/CUB2011/CUB2011/_test.txt" batch_size: 2 shuffle: true new_height: 512 new_width: 512 root_folder: "" } } layer { name: "conv1_1" type: "Convolution" bottom: "data" top: "conv1_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv1_1_bn" type: "BatchNorm" bottom: "conv1_1" top: "conv1_1" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv1_1_scale" type: "Scale" bottom: "conv1_1" top: "conv1_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu1_1" type: "ReLU" bottom: "conv1_1" top: "conv1_1" } layer { name: "conv1_2" type: "Convolution" bottom: "conv1_1" top: "conv1_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv1_2_bn" type: "BatchNorm" bottom: "conv1_2" top: "conv1_2" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv1_2_scale" type: "Scale" bottom: "conv1_2" top: "conv1_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu1_2" type: "ReLU" bottom: "conv1_2" top: "conv1_2" } layer { name: "pool1" type: "Pooling" bottom: "conv1_2" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2_1" type: "Convolution" bottom: "pool1" top: "conv2_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv2_1_bn" type: "BatchNorm" bottom: "conv2_1" top: "conv2_1" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv2_1_scale" type: "Scale" bottom: "conv2_1" top: "conv2_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu2_1" type: "ReLU" bottom: "conv2_1" top: "conv2_1" } layer { name: "conv2_2" type: "Convolution" bottom: "conv2_1" top: "conv2_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv2_2_bn" type: "BatchNorm" bottom: "conv2_2" top: "conv2_2" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv2_2_scale" type: "Scale" bottom: "conv2_2" top: "conv2_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu2_2" type: "ReLU" bottom: "conv2_2" top: "conv2_2" } layer { name: "pool2" type: "Pooling" bottom: "conv2_2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3_1" type: "Convolution" bottom: "pool2" top: "conv3_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv3_1_bn" type: "BatchNorm" bottom: "conv3_1" top: "conv3_1" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv3_1_scale" type: "Scale" bottom: "conv3_1" top: "conv3_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu3_1" type: "ReLU" bottom: "conv3_1" top: "conv3_1" } layer { name: "conv3_2" type: "Convolution" bottom: "conv3_1" top: "conv3_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv3_2_bn" type: "BatchNorm" bottom: "conv3_2" top: "conv3_2" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv3_2_scale" type: "Scale" bottom: "conv3_2" top: "conv3_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu3_2" type: "ReLU" bottom: "conv3_2" top: "conv3_2" } layer { name: "conv3_3" type: "Convolution" bottom: "conv3_2" top: "conv3_3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv3_3_bn" type: "BatchNorm" bottom: "conv3_3" top: "conv3_3" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv1_scale" type: "Scale" bottom: "conv3_3" top: "conv3_3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } }

layer { name: "relu3_3" type: "ReLU" bottom: "conv3_3" top: "conv3_3" } layer { name: "pool3" type: "Pooling" bottom: "conv3_3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4_1" type: "Convolution" bottom: "pool3" top: "conv4_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv4_1_bn" type: "BatchNorm" bottom: "conv4_1" top: "conv4_1" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv4_1_scale" type: "Scale" bottom: "conv4_1" top: "conv4_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu4_1" type: "ReLU" bottom: "conv4_1" top: "conv4_1" } layer { name: "conv4_2" type: "Convolution" bottom: "conv4_1" top: "conv4_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv4_2_bn" type: "BatchNorm" bottom: "conv4_2" top: "conv4_2" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv4_2_scale" type: "Scale" bottom: "conv4_2" top: "conv4_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu4_2" type: "ReLU" bottom: "conv4_2" top: "conv4_2" } layer { name: "conv4_3" type: "Convolution" bottom: "conv4_2" top: "conv4_3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv4_3_bn" type: "BatchNorm" bottom: "conv4_3" top: "conv4_3" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv4_3_scale" type: "Scale" bottom: "conv4_3" top: "conv4_3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu4_3" type: "ReLU" bottom: "conv4_3" top: "conv4_3" } layer { name: "pool4" type: "Pooling" bottom: "conv4_3" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv5_1" type: "Convolution" bottom: "pool4" top: "conv5_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv5_1_bn" type: "BatchNorm" bottom: "conv5_1" top: "conv5_1" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv5_1_scale" type: "Scale" bottom: "conv5_1" top: "conv5_1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu5_1" type: "ReLU" bottom: "conv5_1" top: "conv5_1" } layer { name: "conv5_2" type: "Convolution" bottom: "conv5_1" top: "conv5_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv5_2_bn" type: "BatchNorm" bottom: "conv5_2" top: "conv5_2" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv5_2_scale" type: "Scale" bottom: "conv5_2" top: "conv5_2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu5_2" type: "ReLU" bottom: "conv5_2" top: "conv5_2" } layer { name: "conv5_3" type: "Convolution" bottom: "conv5_2" top: "conv5_3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 512 pad: 1 kernel_size: 3 weight_filler{ type:"msra" } } } layer { name: "conv5_3_bn" type: "BatchNorm" bottom: "conv5_3" top: "conv5_3" param { lr_mult: 0.0 } param { lr_mult: 0.0 } param { lr_mult: 0.0 } } layer { name: "conv5_3_scale" type: "Scale" bottom: "conv5_3" top: "conv5_3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 0.0 } scale_param { bias_term: true } } layer { name: "relu5_3" type: "ReLU" bottom: "conv5_3" top: "conv5_3" } layer { name: "bilinear_layer" type: "Bilinear" bottom: "conv5_3" bottom: "conv5_3" top: "bilinear" compact_bilinear_param { num_output: 8192 sum_pool: true } } layer { name: "signed_sqrt_layer" type: "SignedSqrt" bottom: "bilinear" top: "bilinear_sqrt" } layer { name: "l2_normalization_layer" type: "L2Normalize" bottom: "bilinear_sqrt" top: "bilinear_l2" } layer { name: "fc8_cub_bilinear" type: "InnerProduct" bottom: "bilinear_l2" top: "fc8_cub_bilinear" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 200 weight_filler { type:"msra" } } } layer { name: "loss3/loss3" type: "SoftmaxWithLoss" bottom: "fc8_cub_bilinear" bottom: "label" top: "loss_train" loss_weight: 1 include { phase: TRAIN } } layer { name: "loss3/loss3" type: "SoftmaxWithLoss" bottom: "fc8_cub_bilinear" bottom: "label" top: "loss_test" loss_weight: 1 include { phase: TEST } } layer { name: "loss3/top-1" type: "Accuracy" bottom: "fc8_cub_bilinear" bottom: "label" top: "accuracy_test" include { phase: TEST } } layer { name: "loss3/top-1" type: "Accuracy" bottom: "fc8_cub_bilinear" bottom: "label" top: "accuracy_train" include { phase: TRAIN } } layer { name: "loss3/top-1" type: "Accuracy" bottom: "fc8_cub_bilinear" bottom: "label" top: "top5_train" include { phase: TRAIN } accuracy_param { top_k: 5 } } layer { name: "loss3/top-1" type: "Accuracy" bottom: "fc8_cub_bilinear" bottom: "label" top: "top5_test" include { phase: TEST } accuracy_param { top_k: 5 } }

BaofengZan commented 5 years ago

The solver is test_iter: 1000 test_interval: 3000 base_lr: 0.001 lr_policy: "step" gamma: 0.1 stepsize: 50000 display: 20 max_iter: 200000 momentum: 0.9 weight_decay: 0.0005 snapshot: 10000 snapshot_prefix: "/home/baofengzan/111/confusion/caffe/models/cub" net: "/home/baofengzan/111/confusion/caffe/models/cub/bilinear_vgg/train_val_all.prototxt" solver_mode: GPU

abhimanyudubey commented 5 years ago

Your prototxt looks all right, except I see you're using a batch-size of 2. You will have to adjust the learning rate accordingly, see https://www.reddit.com/r/MachineLearning/comments/84waz4/d_relation_between_learning_rate_batch_size_and/, maybe this might be the issue.

Also, are you sure the correct weights are being loaded? Does the network train with no loss?