NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

batch_norm_layer after InnerProduct #518

Closed MillX2021 closed 6 years ago

MillX2021 commented 6 years ago

@drnikolaev Hi, My research interests focus on the person re-id. I use nvcaffe for train person re-id model. but I find that BN layer(batch_norm_layer) can't after FC Layer(InnerProduct) . train log error: Check failed: axis_index < num_axes() (2 vs. 2) axis 2 out of range for 2-D Blob with shape 64 256 I read code about batch_norm_layer in the nvcaffe.it see that bn layer's input blob is 4-d blob. buf FC's output blob is 2-d blob. what should i do for train this model that bn layer after fc layer?(it can significant improvement precision of person re-id) sample network: layer { name: "fc_feature" type: "InnerProduct" bottom: "pool2" top: "fc_feature" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 1 } inner_product_param { num_output: 512 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0 } } } layer { name: "fc_feature_bn" type: "BatchNorm" bottom: "fc_feature" top: "fc_feature" param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } batch_norm_param { moving_average_fraction: 0.999 eps: 0.001 } }

hanzeze commented 6 years ago

@Millczc @drnikolaev

I have this problem too. Have you solved it??

I0809 13:40:05.411381 21974 caffe.cpp:454] This is NVCaffe 0.16.6 started at Thu Aug 9 13:40:05 2018 I0809 13:40:05.411495 21974 caffe.cpp:456] CuDNN version: USE_CUDNN is not defined I0809 13:40:05.411499 21974 caffe.cpp:457] CuBLAS version: 9000 I0809 13:40:05.411500 21974 caffe.cpp:458] CUDA version: 9000 I0809 13:40:05.411502 21974 caffe.cpp:459] CUDA driver version: 9010 I0809 13:40:05.419993 21974 gpu_memory.cpp:104] GPUMemory::Manager initialized I0809 13:40:05.420399 21974 gpu_memory.cpp:106] Total memory: 8512602112, Free: 8350072832, dev_info[0]: total=8512602112 free=8350072832 I0809 13:40:05.420409 21974 caffe.cpp:182] Using GPUs 0 I0809 13:40:05.420647 21974 caffe.cpp:187] GPU 0: GeForce GTX 1080 I0809 13:40:05.420709 21974 solver.cpp:42] Solver data type: FLOAT I0809 13:40:05.425721 21974 solver.cpp:45] Initializing solver from parameters: test_iter: 100 test_interval: 500 base_lr: 0.01 display: 100 max_iter: 10000 lr_policy: "inv" gamma: 0.0001 power: 0.75 momentum: 0.9 weight_decay: 0.0005 snapshot: 5000 snapshot_prefix: "/media_1T/hanze/model/test_nvcaffe/mnist/" solver_mode: GPU device_id: 0 net: "/media_1T/hanze/model/test_nvcaffe/mnist/lenet_train_test_AddBN.prototxt" I0809 13:40:05.425837 21974 solver.cpp:86] Creating training net from net file: /media_1T/hanze/model/test_nvcaffe/mnist/lenet_train_test_AddBN.prototxt I0809 13:40:05.426069 21974 net.cpp:444] The NetState phase (0) differed from the phase (1) specified by a rule in layer mnist I0809 13:40:05.426079 21974 net.cpp:444] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy I0809 13:40:05.426183 21974 net.cpp:68] Initializing net from parameters: name: "LeNet" state { phase: TRAIN } default_forward_type: FLOAT16 default_backward_type: FLOAT16 default_forward_math: FLOAT16 default_backward_math: FLOAT16 layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 } data_param { source: "/media_1T/hanze/model/test_nvcaffe/mnist/mnist_train_lmdb" batch_size: 128 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "ip2BN" type: "BatchNorm" bottom: "ip2" top: "ip2BN" batch_norm_param { moving_average_fraction: 0.9 eps: 0.0001 scale_bias: true } } layer { name: "ip3" type: "InnerProduct" bottom: "ip2BN" top: "ip3" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip3" bottom: "label" top: "loss" } I0809 13:40:05.426261 21974 layer_factory.hpp:172] Creating layer 'mnist' of type 'Data' I0809 13:40:05.426266 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.426560 21974 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0 I0809 13:40:05.426652 21974 net.cpp:187] Created Layer mnist (0) I0809 13:40:05.426659 21974 net.cpp:529] mnist -> data I0809 13:40:05.426681 21974 net.cpp:529] mnist -> label I0809 13:40:05.426702 21974 data_reader.cpp:55] Sample Data Reader threads: 1, out queues: 1, depth: 128 I0809 13:40:05.427222 21987 blocking_queue.cpp:40] Data layer prefetch queue empty I0809 13:40:05.427222 21974 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0 I0809 13:40:05.427767 21988 db_lmdb.cpp:36] Opened lmdb /media_1T/hanze/model/test_nvcaffe/mnist/mnist_train_lmdb I0809 13:40:05.429107 21974 data_layer.cpp:197] [0] Output data size: 128, 1, 28, 28 I0809 13:40:05.429123 21974 internal_thread.cpp:19] Starting 1 internal thread(s) on device 0 I0809 13:40:05.429143 21974 net.cpp:247] Setting up mnist I0809 13:40:05.429152 21974 net.cpp:254] TRAIN Top shape for layer 0 'mnist' 128 1 28 28 (100352) I0809 13:40:05.429157 21974 net.cpp:254] TRAIN Top shape for layer 0 'mnist' 128 (128) I0809 13:40:05.429160 21974 layer_factory.hpp:172] Creating layer 'conv1' of type 'Convolution' I0809 13:40:05.429673 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.429700 21974 net.cpp:187] Created Layer conv1 (1) I0809 13:40:05.429705 21974 net.cpp:559] conv1 <- data I0809 13:40:05.429715 21974 net.cpp:529] conv1 -> conv1 I0809 13:40:05.430037 21974 net.cpp:247] Setting up conv1 I0809 13:40:05.430048 21974 net.cpp:254] TRAIN Top shape for layer 1 'conv1' 128 20 24 24 (1474560) I0809 13:40:05.430060 21974 layer_factory.hpp:172] Creating layer 'pool1' of type 'Pooling' I0809 13:40:05.430065 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.430073 21974 net.cpp:187] Created Layer pool1 (2) I0809 13:40:05.430078 21974 net.cpp:559] pool1 <- conv1 I0809 13:40:05.430080 21974 net.cpp:529] pool1 -> pool1 I0809 13:40:05.430110 21974 net.cpp:247] Setting up pool1 I0809 13:40:05.430116 21974 net.cpp:254] TRAIN Top shape for layer 2 'pool1' 128 20 12 12 (368640) I0809 13:40:05.430120 21974 layer_factory.hpp:172] Creating layer 'conv2' of type 'Convolution' I0809 13:40:05.430124 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.430133 21974 net.cpp:187] Created Layer conv2 (3) I0809 13:40:05.430136 21974 net.cpp:559] conv2 <- pool1 I0809 13:40:05.430140 21974 net.cpp:529] conv2 -> conv2 I0809 13:40:05.430675 21989 common.cpp:525] NVML initialized, thread 21989 I0809 13:40:05.431200 21974 net.cpp:247] Setting up conv2 I0809 13:40:05.431210 21974 net.cpp:254] TRAIN Top shape for layer 3 'conv2' 128 50 8 8 (409600) I0809 13:40:05.431227 21974 layer_factory.hpp:172] Creating layer 'pool2' of type 'Pooling' I0809 13:40:05.431231 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.431238 21974 net.cpp:187] Created Layer pool2 (4) I0809 13:40:05.431242 21974 net.cpp:559] pool2 <- conv2 I0809 13:40:05.431246 21974 net.cpp:529] pool2 -> pool2 I0809 13:40:05.431269 21974 net.cpp:247] Setting up pool2 I0809 13:40:05.431283 21974 net.cpp:254] TRAIN Top shape for layer 4 'pool2' 128 50 4 4 (102400) I0809 13:40:05.431286 21974 layer_factory.hpp:172] Creating layer 'ip1' of type 'InnerProduct' I0809 13:40:05.431289 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.431306 21974 net.cpp:187] Created Layer ip1 (5) I0809 13:40:05.431310 21974 net.cpp:559] ip1 <- pool2 I0809 13:40:05.431313 21974 net.cpp:529] ip1 -> ip1 I0809 13:40:05.436494 21974 net.cpp:247] Setting up ip1 I0809 13:40:05.436503 21974 net.cpp:254] TRAIN Top shape for layer 5 'ip1' 128 500 (64000) I0809 13:40:05.436532 21974 layer_factory.hpp:172] Creating layer 'relu1' of type 'ReLU' I0809 13:40:05.436537 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.436543 21974 net.cpp:187] Created Layer relu1 (6) I0809 13:40:05.436547 21974 net.cpp:559] relu1 <- ip1 I0809 13:40:05.436550 21974 net.cpp:514] relu1 -> ip1 (in-place) I0809 13:40:05.436554 21974 net.cpp:247] Setting up relu1 I0809 13:40:05.436558 21974 net.cpp:254] TRAIN Top shape for layer 6 'relu1' 128 500 (64000) I0809 13:40:05.436559 21974 layer_factory.hpp:172] Creating layer 'ip2' of type 'InnerProduct' I0809 13:40:05.436563 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.436568 21974 net.cpp:187] Created Layer ip2 (7) I0809 13:40:05.436569 21974 net.cpp:559] ip2 <- ip1 I0809 13:40:05.436573 21974 net.cpp:529] ip2 -> ip2 I0809 13:40:05.437166 21974 net.cpp:247] Setting up ip2 I0809 13:40:05.437175 21974 net.cpp:254] TRAIN Top shape for layer 7 'ip2' 128 10 (1280) I0809 13:40:05.437192 21974 layer_factory.hpp:172] Creating layer 'ip2BN' of type 'BatchNorm' I0809 13:40:05.437196 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT16 Fmath:FLOAT16 Bmath:FLOAT16 I0809 13:40:05.437204 21974 net.cpp:187] Created Layer ip2BN (8) I0809 13:40:05.437208 21974 net.cpp:559] ip2BN <- ip2 I0809 13:40:05.437212 21974 net.cpp:529] ip2BN -> ip2BN F0809 13:40:05.437366 21974 blob.hpp:262] Check failed: axis_index < num_axes() (2 vs. 2) axis 2 out of range for 2-D Blob with shape 128 10 (1280) Check failure stack trace: @ 0x7f6033d695cd google::LogMessage::Fail() @ 0x7f6033d6b433 google::LogMessage::SendToLog() @ 0x7f6033d6915b google::LogMessage::Flush() @ 0x7f6033d6be1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f603481eb78 caffe::Blob::CanonicalAxisIndex() @ 0x7f60348fa982 caffe::BatchNormLayer<>::LayerSetUp() @ 0x7f60347fad0e caffe::Net::Init() @ 0x7f60347fc672 caffe::Net::Net() @ 0x7f6034a4b160 caffe::Solver::InitTrainNet() @ 0x7f6034a4b684 caffe::Solver::Init() @ 0x7f6034a4bbb6 caffe::Solver::Solver() @ 0x7f60347833d6 caffe::Creator_SGDSolver() @ 0x413cf6 caffe::SolverRegistry::CreateSolver() @ 0x40d512 train() @ 0x40a8a8 main @ 0x7f6032edd830 __libc_start_main @ 0x40b069 _start @ (nil) (unknown)

drnikolaev commented 6 years ago

@Millczc I0809 13:40:05.426266 21974 layer_factory.hpp:184] Layer's types are Ftype:FLOAT16 Btype:FLOAT Fmath:FLOAT16 Bmath:FLOAT16 Seems like mixing FLOAT16 on Forward and FLOAT on Backward doesn't work well. Please try FLOAT16 everywhere

drnikolaev commented 6 years ago

@Millczc Please verify https://github.com/NVIDIA/caffe/tree/v0.17.1 release and reopen the issue if needed.