BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.13k stars 18.68k forks source link

Can't cast the same trained model to net on different machine #4713

Open xxw345 opened 8 years ago

xxw345 commented 8 years ago

I got two machine working with Caffe. One is a single machine with 3 GPUs (k20), which I used for trained and fine-tuning the model. Another is a GPU cluster used for large output.

I recently trained a model and test output on the k20 machine, which works great.

import caffe net = caffe.Net('deploy_full.prototxt',caffe.TEST) WARNING: Logging before InitGoogleLogging() is written to STDERR I0911 15:20:11.718778 24506 net.cpp:52] Initializing net from parameters: input: "data" state { phase: TEST } input_shape { dim: 1 dim: 3 dim: 2064 dim: 2064 } layer { name: "conv1a" type: "Convolution" bottom: "data" top: "conv1a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 16 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn1a" type: "BatchNorm" bottom: "conv1a" top: "bn1a" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu1a" type: "ReLU" bottom: "bn1a" top: "relu1a" } layer { name: "conv1b" type: "Convolution" bottom: "relu1a" top: "conv1b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 16 kernel_size: 2 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn1b" type: "BatchNorm" bottom: "conv1b" top: "bn1b" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu1b" type: "ReLU" bottom: "bn1b" top: "relu1b" } layer { name: "conv2a" type: "Convolution" bottom: "relu1b" top: "conv2a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn2a" type: "BatchNorm" bottom: "conv2a" top: "bn2a" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu2a" type: "ReLU" bottom: "bn2a" top: "relu2a" } layer { name: "conv2b" type: "Convolution" bottom: "relu2a" top: "conv2b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 32 kernel_size: 3 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn2b" type: "BatchNorm" bottom: "conv2b" top: "bn2b" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu2b" type: "ReLU" bottom: "bn2b" top: "relu2b" } layer { name: "conv3a" type: "Convolution" bottom: "relu2b" top: "conv3a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn3a" type: "BatchNorm" bottom: "conv3a" top: "bn3a" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu3a" type: "ReLU" bottom: "bn3a" top: "relu3a" } layer { name: "conv3b" type: "Convolution" bottom: "relu3a" top: "conv3b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 kernel_size: 4 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn3b" type: "BatchNorm" bottom: "conv3b" top: "bn3b" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu3b" type: "ReLU" bottom: "bn3b" top: "relu3b" } layer { name: "fc8-conv" type: "Convolution" bottom: "relu3b" top: "fc8-conv" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 2 kernel_size: 5 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "softmax" type: "Softmax" bottom: "fc8-conv" top: "softmax" } layer { name: "prob" type: "Softmax" bottom: "fc8-conv" top: "prob" } I0911 15:20:11.719876 24506 net.cpp:416] Input 0 -> data I0911 15:20:11.719930 24506 layer_factory.hpp:76] Creating layer conv1a I0911 15:20:11.719955 24506 net.cpp:109] Creating Layer conv1a I0911 15:20:11.719975 24506 net.cpp:457] conv1a <- data I0911 15:20:11.719990 24506 net.cpp:414] conv1a -> conv1a I0911 15:20:11.743553 24506 net.cpp:153] Setting up conv1a I0911 15:20:11.743629 24506 net.cpp:160] Top shape: 1 16 2062 2062 (68029504) I0911 15:20:11.743641 24506 net.cpp:168] Memory required for data: 272118016 I0911 15:20:11.743687 24506 layer_factory.hpp:76] Creating layer bn1a I0911 15:20:11.743721 24506 net.cpp:109] Creating Layer bn1a I0911 15:20:11.743736 24506 net.cpp:457] bn1a <- conv1a I0911 15:20:11.743757 24506 net.cpp:414] bn1a -> bn1a I0911 15:20:11.754132 24506 net.cpp:153] Setting up bn1a I0911 15:20:11.754212 24506 net.cpp:160] Top shape: 1 16 2062 2062 (68029504) I0911 15:20:11.754223 24506 net.cpp:168] Memory required for data: 544236032 I0911 15:20:11.754276 24506 layer_factory.hpp:76] Creating layer relu1a I0911 15:20:11.754308 24506 net.cpp:109] Creating Layer relu1a I0911 15:20:11.754322 24506 net.cpp:457] relu1a <- bn1a I0911 15:20:11.754338 24506 net.cpp:414] relu1a -> relu1a I0911 15:20:11.754369 24506 net.cpp:153] Setting up relu1a I0911 15:20:11.754390 24506 net.cpp:160] Top shape: 1 16 2062 2062 (68029504) I0911 15:20:11.754400 24506 net.cpp:168] Memory required for data: 816354048 I0911 15:20:11.754411 24506 layer_factory.hpp:76] Creating layer conv1b I0911 15:20:11.754441 24506 net.cpp:109] Creating Layer conv1b I0911 15:20:11.754459 24506 net.cpp:457] conv1b <- relu1a I0911 15:20:11.754477 24506 net.cpp:414] conv1b -> conv1b I0911 15:20:11.757207 24506 net.cpp:153] Setting up conv1b I0911 15:20:11.757230 24506 net.cpp:160] Top shape: 1 16 1031 1031 (17007376) I0911 15:20:11.757242 24506 net.cpp:168] Memory required for data: 884383552 I0911 15:20:11.757261 24506 layer_factory.hpp:76] Creating layer bn1b I0911 15:20:11.757282 24506 net.cpp:109] Creating Layer bn1b I0911 15:20:11.757294 24506 net.cpp:457] bn1b <- conv1b I0911 15:20:11.757308 24506 net.cpp:414] bn1b -> bn1b I0911 15:20:11.759799 24506 net.cpp:153] Setting up bn1b I0911 15:20:11.759825 24506 net.cpp:160] Top shape: 1 16 1031 1031 (17007376) I0911 15:20:11.759840 24506 net.cpp:168] Memory required for data: 952413056 I0911 15:20:11.759860 24506 layer_factory.hpp:76] Creating layer relu1b I0911 15:20:11.759877 24506 net.cpp:109] Creating Layer relu1b I0911 15:20:11.759888 24506 net.cpp:457] relu1b <- bn1b I0911 15:20:11.759908 24506 net.cpp:414] relu1b -> relu1b I0911 15:20:11.759925 24506 net.cpp:153] Setting up relu1b I0911 15:20:11.759943 24506 net.cpp:160] Top shape: 1 16 1031 1031 (17007376) I0911 15:20:11.759954 24506 net.cpp:168] Memory required for data: 1020442560 I0911 15:20:11.759963 24506 layer_factory.hpp:76] Creating layer conv2a I0911 15:20:11.759982 24506 net.cpp:109] Creating Layer conv2a I0911 15:20:11.759994 24506 net.cpp:457] conv2a <- relu1b I0911 15:20:11.760011 24506 net.cpp:414] conv2a -> conv2a I0911 15:20:11.762781 24506 net.cpp:153] Setting up conv2a I0911 15:20:11.762810 24506 net.cpp:160] Top shape: 1 32 1029 1029 (33882912) I0911 15:20:11.762821 24506 net.cpp:168] Memory required for data: 1155974208 I0911 15:20:11.762836 24506 layer_factory.hpp:76] Creating layer bn2a I0911 15:20:11.762856 24506 net.cpp:109] Creating Layer bn2a I0911 15:20:11.762867 24506 net.cpp:457] bn2a <- conv2a I0911 15:20:11.762882 24506 net.cpp:414] bn2a -> bn2a I0911 15:20:11.765368 24506 net.cpp:153] Setting up bn2a I0911 15:20:11.765394 24506 net.cpp:160] Top shape: 1 32 1029 1029 (33882912) I0911 15:20:11.765410 24506 net.cpp:168] Memory required for data: 1291505856 I0911 15:20:11.765436 24506 layer_factory.hpp:76] Creating layer relu2a I0911 15:20:11.765456 24506 net.cpp:109] Creating Layer relu2a I0911 15:20:11.765467 24506 net.cpp:457] relu2a <- bn2a I0911 15:20:11.765480 24506 net.cpp:414] relu2a -> relu2a I0911 15:20:11.765499 24506 net.cpp:153] Setting up relu2a I0911 15:20:11.765511 24506 net.cpp:160] Top shape: 1 32 1029 1029 (33882912) I0911 15:20:11.765522 24506 net.cpp:168] Memory required for data: 1427037504 I0911 15:20:11.765532 24506 layer_factory.hpp:76] Creating layer conv2b I0911 15:20:11.765552 24506 net.cpp:109] Creating Layer conv2b I0911 15:20:11.765563 24506 net.cpp:457] conv2b <- relu2a I0911 15:20:11.765575 24506 net.cpp:414] conv2b -> conv2b I0911 15:20:11.766615 24506 net.cpp:153] Setting up conv2b I0911 15:20:11.766638 24506 net.cpp:160] Top shape: 1 32 514 514 (8454272) I0911 15:20:11.766649 24506 net.cpp:168] Memory required for data: 1460854592 I0911 15:20:11.766664 24506 layer_factory.hpp:76] Creating layer bn2b I0911 15:20:11.766682 24506 net.cpp:109] Creating Layer bn2b I0911 15:20:11.766697 24506 net.cpp:457] bn2b <- conv2b I0911 15:20:11.766708 24506 net.cpp:414] bn2b -> bn2b I0911 15:20:11.767351 24506 net.cpp:153] Setting up bn2b I0911 15:20:11.767371 24506 net.cpp:160] Top shape: 1 32 514 514 (8454272) I0911 15:20:11.767384 24506 net.cpp:168] Memory required for data: 1494671680 I0911 15:20:11.767403 24506 layer_factory.hpp:76] Creating layer relu2b I0911 15:20:11.767418 24506 net.cpp:109] Creating Layer relu2b I0911 15:20:11.767433 24506 net.cpp:457] relu2b <- bn2b I0911 15:20:11.767444 24506 net.cpp:414] relu2b -> relu2b I0911 15:20:11.767459 24506 net.cpp:153] Setting up relu2b I0911 15:20:11.767472 24506 net.cpp:160] Top shape: 1 32 514 514 (8454272) I0911 15:20:11.767482 24506 net.cpp:168] Memory required for data: 1528488768 I0911 15:20:11.767491 24506 layer_factory.hpp:76] Creating layer conv3a I0911 15:20:11.767511 24506 net.cpp:109] Creating Layer conv3a I0911 15:20:11.767523 24506 net.cpp:457] conv3a <- relu2b I0911 15:20:11.767536 24506 net.cpp:414] conv3a -> conv3a I0911 15:20:11.768887 24506 net.cpp:153] Setting up conv3a I0911 15:20:11.768910 24506 net.cpp:160] Top shape: 1 64 512 512 (16777216) I0911 15:20:11.768921 24506 net.cpp:168] Memory required for data: 1595597632 I0911 15:20:11.768935 24506 layer_factory.hpp:76] Creating layer bn3a I0911 15:20:11.768954 24506 net.cpp:109] Creating Layer bn3a I0911 15:20:11.768970 24506 net.cpp:457] bn3a <- conv3a I0911 15:20:11.768981 24506 net.cpp:414] bn3a -> bn3a I0911 15:20:11.769618 24506 net.cpp:153] Setting up bn3a I0911 15:20:11.769637 24506 net.cpp:160] Top shape: 1 64 512 512 (16777216) I0911 15:20:11.769647 24506 net.cpp:168] Memory required for data: 1662706496 I0911 15:20:11.769675 24506 layer_factory.hpp:76] Creating layer relu3a I0911 15:20:11.769690 24506 net.cpp:109] Creating Layer relu3a I0911 15:20:11.769701 24506 net.cpp:457] relu3a <- bn3a I0911 15:20:11.769718 24506 net.cpp:414] relu3a -> relu3a I0911 15:20:11.769737 24506 net.cpp:153] Setting up relu3a I0911 15:20:11.769749 24506 net.cpp:160] Top shape: 1 64 512 512 (16777216) I0911 15:20:11.769758 24506 net.cpp:168] Memory required for data: 1729815360 I0911 15:20:11.769769 24506 layer_factory.hpp:76] Creating layer conv3b I0911 15:20:11.769786 24506 net.cpp:109] Creating Layer conv3b I0911 15:20:11.769800 24506 net.cpp:457] conv3b <- relu3a I0911 15:20:11.769817 24506 net.cpp:414] conv3b -> conv3b I0911 15:20:11.772348 24506 net.cpp:153] Setting up conv3b I0911 15:20:11.772372 24506 net.cpp:160] Top shape: 1 64 255 255 (4161600) I0911 15:20:11.772383 24506 net.cpp:168] Memory required for data: 1746461760 I0911 15:20:11.772397 24506 layer_factory.hpp:76] Creating layer bn3b I0911 15:20:11.772419 24506 net.cpp:109] Creating Layer bn3b I0911 15:20:11.772430 24506 net.cpp:457] bn3b <- conv3b I0911 15:20:11.772444 24506 net.cpp:414] bn3b -> bn3b I0911 15:20:11.772630 24506 net.cpp:153] Setting up bn3b I0911 15:20:11.772650 24506 net.cpp:160] Top shape: 1 64 255 255 (4161600) I0911 15:20:11.772662 24506 net.cpp:168] Memory required for data: 1763108160 I0911 15:20:11.772680 24506 layer_factory.hpp:76] Creating layer relu3b I0911 15:20:11.772693 24506 net.cpp:109] Creating Layer relu3b I0911 15:20:11.772703 24506 net.cpp:457] relu3b <- bn3b I0911 15:20:11.772718 24506 net.cpp:414] relu3b -> relu3b I0911 15:20:11.772734 24506 net.cpp:153] Setting up relu3b I0911 15:20:11.772747 24506 net.cpp:160] Top shape: 1 64 255 255 (4161600) I0911 15:20:11.772758 24506 net.cpp:168] Memory required for data: 1779754560 I0911 15:20:11.772768 24506 layer_factory.hpp:76] Creating layer fc8-conv I0911 15:20:11.772785 24506 net.cpp:109] Creating Layer fc8-conv I0911 15:20:11.772796 24506 net.cpp:457] fc8-conv <- relu3b I0911 15:20:11.772810 24506 net.cpp:414] fc8-conv -> fc8-conv I0911 15:20:11.773162 24506 net.cpp:153] Setting up fc8-conv I0911 15:20:11.773183 24506 net.cpp:160] Top shape: 1 2 251 251 (126002) I0911 15:20:11.773195 24506 net.cpp:168] Memory required for data: 1780258568 I0911 15:20:11.773208 24506 layer_factory.hpp:76] Creating layer fc8-conv_fc8-conv_0_split I0911 15:20:11.773229 24506 net.cpp:109] Creating Layer fc8-conv_fc8-conv_0_split I0911 15:20:11.773239 24506 net.cpp:457] fc8-conv_fc8-conv_0_split <- fc8-conv I0911 15:20:11.773257 24506 net.cpp:414] fc8-conv_fc8-conv_0_split -> fc8-conv_fc8-conv_0_split_0 I0911 15:20:11.773269 24506 net.cpp:414] fc8-conv_fc8-conv_0_split -> fc8-conv_fc8-conv_0_split_1 I0911 15:20:11.773288 24506 net.cpp:153] Setting up fc8-conv_fc8-conv_0_split I0911 15:20:11.773301 24506 net.cpp:160] Top shape: 1 2 251 251 (126002) I0911 15:20:11.773315 24506 net.cpp:160] Top shape: 1 2 251 251 (126002) I0911 15:20:11.773324 24506 net.cpp:168] Memory required for data: 1781266584 I0911 15:20:11.773334 24506 layer_factory.hpp:76] Creating layer softmax I0911 15:20:11.773347 24506 net.cpp:109] Creating Layer softmax I0911 15:20:11.773357 24506 net.cpp:457] softmax <- fc8-conv_fc8-conv_0_split_0 I0911 15:20:11.773370 24506 net.cpp:414] softmax -> softmax I0911 15:20:11.773391 24506 net.cpp:153] Setting up softmax I0911 15:20:11.773412 24506 net.cpp:160] Top shape: 1 2 251 251 (126002) I0911 15:20:11.773422 24506 net.cpp:168] Memory required for data: 1781770592 I0911 15:20:11.773433 24506 layer_factory.hpp:76] Creating layer prob I0911 15:20:11.773444 24506 net.cpp:109] Creating Layer prob I0911 15:20:11.773459 24506 net.cpp:457] prob <- fc8-conv_fc8-conv_0_split_1 I0911 15:20:11.773474 24506 net.cpp:414] prob -> prob I0911 15:20:11.773491 24506 net.cpp:153] Setting up prob I0911 15:20:11.773504 24506 net.cpp:160] Top shape: 1 2 251 251 (126002) I0911 15:20:11.773514 24506 net.cpp:168] Memory required for data: 1782274600 I0911 15:20:11.773524 24506 net.cpp:231] prob does not need backward computation. I0911 15:20:11.773533 24506 net.cpp:231] softmax does not need backward computation. I0911 15:20:11.773542 24506 net.cpp:231] fc8-conv_fc8-conv_0_split does not need backward computation. I0911 15:20:11.773557 24506 net.cpp:231] fc8-conv does not need backward computation. I0911 15:20:11.773566 24506 net.cpp:231] relu3b does not need backward computation. I0911 15:20:11.773576 24506 net.cpp:231] bn3b does not need backward computation. I0911 15:20:11.773586 24506 net.cpp:231] conv3b does not need backward computation. I0911 15:20:11.773597 24506 net.cpp:231] relu3a does not need backward computation. I0911 15:20:11.773607 24506 net.cpp:231] bn3a does not need backward computation. I0911 15:20:11.773617 24506 net.cpp:231] conv3a does not need backward computation. I0911 15:20:11.773628 24506 net.cpp:231] relu2b does not need backward computation. I0911 15:20:11.773638 24506 net.cpp:231] bn2b does not need backward computation. I0911 15:20:11.773648 24506 net.cpp:231] conv2b does not need backward computation. I0911 15:20:11.773663 24506 net.cpp:231] relu2a does not need backward computation. I0911 15:20:11.773675 24506 net.cpp:231] bn2a does not need backward computation. I0911 15:20:11.773686 24506 net.cpp:231] conv2a does not need backward computation. I0911 15:20:11.773697 24506 net.cpp:231] relu1b does not need backward computation. I0911 15:20:11.773707 24506 net.cpp:231] bn1b does not need backward computation. I0911 15:20:11.773716 24506 net.cpp:231] conv1b does not need backward computation. I0911 15:20:11.773727 24506 net.cpp:231] relu1a does not need backward computation. I0911 15:20:11.773737 24506 net.cpp:231] bn1a does not need backward computation. I0911 15:20:11.773751 24506 net.cpp:231] conv1a does not need backward computation. I0911 15:20:11.773762 24506 net.cpp:273] This network produces output prob I0911 15:20:11.773772 24506 net.cpp:273] This network produces output softmax I0911 15:20:11.773802 24506 net.cpp:286] Network initialization done. net.copy_from('full_convolutional_net.caffemodel')

But when I copy the trained model to the GPU cluster, strange things happen. It gave the error about the size mismatch between the net definition in prototxt and trained weights:

import caffe net = caffe.Net('deploy_full.prototxt',caffe.TEST) WARNING: Logging before InitGoogleLogging() is written to STDERR I0911 16:01:49.568043 11431 upgrade_proto.cpp:66] Attempting to upgrade input file specified using deprecated input fields: deploy_full.prototxt I0911 16:01:49.568075 11431 upgrade_proto.cpp:69] Successfully upgraded file specified using deprecated input fields. W0911 16:01:49.568079 11431 upgrade_proto.cpp:71] Note that future Caffe releases will only support input layers and not input fields. I0911 16:01:49.934638 11431 net.cpp:52] Initializing net from parameters: state { phase: TEST } layer { name: "input" type: "Input" top: "data" input_param { shape { dim: 1 dim: 3 dim: 2064 dim: 2064 } } } layer { name: "conv1a" type: "Convolution" bottom: "data" top: "conv1a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 16 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn1a" type: "BatchNorm" bottom: "conv1a" top: "bn1a" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu1a" type: "ReLU" bottom: "bn1a" top: "relu1a" } layer { name: "conv1b" type: "Convolution" bottom: "relu1a" top: "conv1b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 16 kernel_size: 2 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn1b" type: "BatchNorm" bottom: "conv1b" top: "bn1b" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu1b" type: "ReLU" bottom: "bn1b" top: "relu1b" } layer { name: "conv2a" type: "Convolution" bottom: "relu1b" top: "conv2a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 32 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn2a" type: "BatchNorm" bottom: "conv2a" top: "bn2a" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu2a" type: "ReLU" bottom: "bn2a" top: "relu2a" } layer { name: "conv2b" type: "Convolution" bottom: "relu2a" top: "conv2b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 32 kernel_size: 3 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn2b" type: "BatchNorm" bottom: "conv2b" top: "bn2b" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu2b" type: "ReLU" bottom: "bn2b" top: "relu2b" } layer { name: "conv3a" type: "Convolution" bottom: "relu2b" top: "conv3a" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 kernel_size: 3 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn3a" type: "BatchNorm" bottom: "conv3a" top: "bn3a" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu3a" type: "ReLU" bottom: "bn3a" top: "relu3a" } layer { name: "conv3b" type: "Convolution" bottom: "relu3a" top: "conv3b" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 64 kernel_size: 4 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "bn3b" type: "BatchNorm" bottom: "conv3b" top: "bn3b" batch_norm_param { moving_average_fraction: 0.95 scale_filler { type: "constant" value: 1 } bias_filler { type: "constant" value: 0.001 } engine: CUDNN } } layer { name: "relu3b" type: "ReLU" bottom: "bn3b" top: "relu3b" } layer { name: "fc8-conv" type: "Convolution" bottom: "relu3b" top: "fc8-conv" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 2 kernel_size: 5 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "softmax" type: "Softmax" bottom: "fc8-conv" top: "softmax" } layer { name: "prob" type: "Softmax" bottom: "fc8-conv" top: "prob" } I0911 16:01:49.934902 11431 layer_factory.hpp:77] Creating layer input I0911 16:01:49.934918 11431 net.cpp:94] Creating Layer input I0911 16:01:49.934926 11431 net.cpp:409] input -> data I0911 16:01:49.934954 11431 net.cpp:144] Setting up input I0911 16:01:49.934964 11431 net.cpp:151] Top shape: 1 3 2064 2064 (12780288) I0911 16:01:49.934967 11431 net.cpp:159] Memory required for data: 51121152 I0911 16:01:49.934973 11431 layer_factory.hpp:77] Creating layer conv1a I0911 16:01:49.934984 11431 net.cpp:94] Creating Layer conv1a I0911 16:01:49.934990 11431 net.cpp:435] conv1a <- data I0911 16:01:49.934996 11431 net.cpp:409] conv1a -> conv1a I0911 16:01:49.944478 11431 net.cpp:144] Setting up conv1a I0911 16:01:49.944495 11431 net.cpp:151] Top shape: 1 16 2062 2062 (68029504) I0911 16:01:49.944499 11431 net.cpp:159] Memory required for data: 323239168 I0911 16:01:49.944512 11431 layer_factory.hpp:77] Creating layer bn1a I0911 16:01:49.944526 11431 net.cpp:94] Creating Layer bn1a I0911 16:01:49.944530 11431 net.cpp:435] bn1a <- conv1a I0911 16:01:49.944535 11431 net.cpp:409] bn1a -> bn1a I0911 16:01:49.952821 11431 net.cpp:144] Setting up bn1a I0911 16:01:49.952831 11431 net.cpp:151] Top shape: 1 16 2062 2062 (68029504) I0911 16:01:49.952834 11431 net.cpp:159] Memory required for data: 595357184 I0911 16:01:49.952848 11431 layer_factory.hpp:77] Creating layer relu1a I0911 16:01:49.952862 11431 net.cpp:94] Creating Layer relu1a I0911 16:01:49.952867 11431 net.cpp:435] relu1a <- bn1a I0911 16:01:49.952872 11431 net.cpp:409] relu1a -> relu1a I0911 16:01:49.952879 11431 net.cpp:144] Setting up relu1a I0911 16:01:49.952884 11431 net.cpp:151] Top shape: 1 16 2062 2062 (68029504) I0911 16:01:49.952888 11431 net.cpp:159] Memory required for data: 867475200 I0911 16:01:49.952890 11431 layer_factory.hpp:77] Creating layer conv1b I0911 16:01:49.952903 11431 net.cpp:94] Creating Layer conv1b I0911 16:01:49.952906 11431 net.cpp:435] conv1b <- relu1a I0911 16:01:49.952913 11431 net.cpp:409] conv1b -> conv1b I0911 16:01:49.955472 11431 net.cpp:144] Setting up conv1b I0911 16:01:49.955482 11431 net.cpp:151] Top shape: 1 16 1031 1031 (17007376) I0911 16:01:49.955485 11431 net.cpp:159] Memory required for data: 935504704 I0911 16:01:49.955493 11431 layer_factory.hpp:77] Creating layer bn1b I0911 16:01:49.955502 11431 net.cpp:94] Creating Layer bn1b I0911 16:01:49.955505 11431 net.cpp:435] bn1b <- conv1b I0911 16:01:49.955512 11431 net.cpp:409] bn1b -> bn1b I0911 16:01:49.957986 11431 net.cpp:144] Setting up bn1b I0911 16:01:49.957993 11431 net.cpp:151] Top shape: 1 16 1031 1031 (17007376) I0911 16:01:49.957996 11431 net.cpp:159] Memory required for data: 1003534208 I0911 16:01:49.958005 11431 layer_factory.hpp:77] Creating layer relu1b I0911 16:01:49.958012 11431 net.cpp:94] Creating Layer relu1b I0911 16:01:49.958015 11431 net.cpp:435] relu1b <- bn1b I0911 16:01:49.958020 11431 net.cpp:409] relu1b -> relu1b I0911 16:01:49.958027 11431 net.cpp:144] Setting up relu1b I0911 16:01:49.958031 11431 net.cpp:151] Top shape: 1 16 1031 1031 (17007376) I0911 16:01:49.958034 11431 net.cpp:159] Memory required for data: 1071563712 I0911 16:01:49.958036 11431 layer_factory.hpp:77] Creating layer conv2a I0911 16:01:49.958046 11431 net.cpp:94] Creating Layer conv2a I0911 16:01:49.958050 11431 net.cpp:435] conv2a <- relu1b I0911 16:01:49.958055 11431 net.cpp:409] conv2a -> conv2a I0911 16:01:49.960554 11431 net.cpp:144] Setting up conv2a I0911 16:01:49.960562 11431 net.cpp:151] Top shape: 1 32 1029 1029 (33882912) I0911 16:01:49.960566 11431 net.cpp:159] Memory required for data: 1207095360 I0911 16:01:49.960572 11431 layer_factory.hpp:77] Creating layer bn2a I0911 16:01:49.960582 11431 net.cpp:94] Creating Layer bn2a I0911 16:01:49.960584 11431 net.cpp:435] bn2a <- conv2a I0911 16:01:49.960590 11431 net.cpp:409] bn2a -> bn2a I0911 16:01:49.963047 11431 net.cpp:144] Setting up bn2a I0911 16:01:49.963054 11431 net.cpp:151] Top shape: 1 32 1029 1029 (33882912) I0911 16:01:49.963057 11431 net.cpp:159] Memory required for data: 1342627008 I0911 16:01:49.963068 11431 layer_factory.hpp:77] Creating layer relu2a I0911 16:01:49.963075 11431 net.cpp:94] Creating Layer relu2a I0911 16:01:49.963078 11431 net.cpp:435] relu2a <- bn2a I0911 16:01:49.963083 11431 net.cpp:409] relu2a -> relu2a I0911 16:01:49.963090 11431 net.cpp:144] Setting up relu2a I0911 16:01:49.963094 11431 net.cpp:151] Top shape: 1 32 1029 1029 (33882912) I0911 16:01:49.963098 11431 net.cpp:159] Memory required for data: 1478158656 I0911 16:01:49.963099 11431 layer_factory.hpp:77] Creating layer conv2b I0911 16:01:49.963109 11431 net.cpp:94] Creating Layer conv2b I0911 16:01:49.963114 11431 net.cpp:435] conv2b <- relu2a I0911 16:01:49.963119 11431 net.cpp:409] conv2b -> conv2b I0911 16:01:49.963800 11431 net.cpp:144] Setting up conv2b I0911 16:01:49.963809 11431 net.cpp:151] Top shape: 1 32 514 514 (8454272) I0911 16:01:49.963811 11431 net.cpp:159] Memory required for data: 1511975744 I0911 16:01:49.963817 11431 layer_factory.hpp:77] Creating layer bn2b I0911 16:01:49.963825 11431 net.cpp:94] Creating Layer bn2b I0911 16:01:49.963829 11431 net.cpp:435] bn2b <- conv2b I0911 16:01:49.963835 11431 net.cpp:409] bn2b -> bn2b I0911 16:01:49.964421 11431 net.cpp:144] Setting up bn2b I0911 16:01:49.964427 11431 net.cpp:151] Top shape: 1 32 514 514 (8454272) I0911 16:01:49.964431 11431 net.cpp:159] Memory required for data: 1545792832 I0911 16:01:49.964439 11431 layer_factory.hpp:77] Creating layer relu2b I0911 16:01:49.964445 11431 net.cpp:94] Creating Layer relu2b I0911 16:01:49.964449 11431 net.cpp:435] relu2b <- bn2b I0911 16:01:49.964453 11431 net.cpp:409] relu2b -> relu2b I0911 16:01:49.964460 11431 net.cpp:144] Setting up relu2b I0911 16:01:49.964464 11431 net.cpp:151] Top shape: 1 32 514 514 (8454272) I0911 16:01:49.964467 11431 net.cpp:159] Memory required for data: 1579609920 I0911 16:01:49.964469 11431 layer_factory.hpp:77] Creating layer conv3a I0911 16:01:49.964479 11431 net.cpp:94] Creating Layer conv3a I0911 16:01:49.964483 11431 net.cpp:435] conv3a <- relu2b I0911 16:01:49.964488 11431 net.cpp:409] conv3a -> conv3a I0911 16:01:49.965278 11431 net.cpp:144] Setting up conv3a I0911 16:01:49.965286 11431 net.cpp:151] Top shape: 1 64 512 512 (16777216) I0911 16:01:49.965289 11431 net.cpp:159] Memory required for data: 1646718784 I0911 16:01:49.965296 11431 layer_factory.hpp:77] Creating layer bn3a I0911 16:01:49.965303 11431 net.cpp:94] Creating Layer bn3a I0911 16:01:49.965306 11431 net.cpp:435] bn3a <- conv3a I0911 16:01:49.965312 11431 net.cpp:409] bn3a -> bn3a I0911 16:01:49.965862 11431 net.cpp:144] Setting up bn3a I0911 16:01:49.965868 11431 net.cpp:151] Top shape: 1 64 512 512 (16777216) I0911 16:01:49.965872 11431 net.cpp:159] Memory required for data: 1713827648 I0911 16:01:49.965884 11431 layer_factory.hpp:77] Creating layer relu3a I0911 16:01:49.965891 11431 net.cpp:94] Creating Layer relu3a I0911 16:01:49.965894 11431 net.cpp:435] relu3a <- bn3a I0911 16:01:49.965899 11431 net.cpp:409] relu3a -> relu3a I0911 16:01:49.965905 11431 net.cpp:144] Setting up relu3a I0911 16:01:49.965909 11431 net.cpp:151] Top shape: 1 64 512 512 (16777216) I0911 16:01:49.965912 11431 net.cpp:159] Memory required for data: 1780936512 I0911 16:01:49.965915 11431 layer_factory.hpp:77] Creating layer conv3b I0911 16:01:49.965925 11431 net.cpp:94] Creating Layer conv3b I0911 16:01:49.965929 11431 net.cpp:435] conv3b <- relu3a I0911 16:01:49.965934 11431 net.cpp:409] conv3b -> conv3b I0911 16:01:49.966846 11431 net.cpp:144] Setting up conv3b I0911 16:01:49.966856 11431 net.cpp:151] Top shape: 1 64 255 255 (4161600) I0911 16:01:49.966858 11431 net.cpp:159] Memory required for data: 1797582912 I0911 16:01:49.966864 11431 layer_factory.hpp:77] Creating layer bn3b I0911 16:01:49.966872 11431 net.cpp:94] Creating Layer bn3b I0911 16:01:49.966876 11431 net.cpp:435] bn3b <- conv3b I0911 16:01:49.966882 11431 net.cpp:409] bn3b -> bn3b I0911 16:01:49.967041 11431 net.cpp:144] Setting up bn3b I0911 16:01:49.967046 11431 net.cpp:151] Top shape: 1 64 255 255 (4161600) I0911 16:01:49.967049 11431 net.cpp:159] Memory required for data: 1814229312 I0911 16:01:49.967057 11431 layer_factory.hpp:77] Creating layer relu3b I0911 16:01:49.967063 11431 net.cpp:94] Creating Layer relu3b I0911 16:01:49.967067 11431 net.cpp:435] relu3b <- bn3b I0911 16:01:49.967070 11431 net.cpp:409] relu3b -> relu3b I0911 16:01:49.967077 11431 net.cpp:144] Setting up relu3b I0911 16:01:49.967080 11431 net.cpp:151] Top shape: 1 64 255 255 (4161600) I0911 16:01:49.967082 11431 net.cpp:159] Memory required for data: 1830875712 I0911 16:01:49.967085 11431 layer_factory.hpp:77] Creating layer fc8-conv I0911 16:01:49.967094 11431 net.cpp:94] Creating Layer fc8-conv I0911 16:01:49.967097 11431 net.cpp:435] fc8-conv <- relu3b I0911 16:01:49.967103 11431 net.cpp:409] fc8-conv -> fc8-conv I0911 16:01:49.967321 11431 net.cpp:144] Setting up fc8-conv I0911 16:01:49.967340 11431 net.cpp:151] Top shape: 1 2 251 251 (126002) I0911 16:01:49.967344 11431 net.cpp:159] Memory required for data: 1831379720 I0911 16:01:49.967350 11431 layer_factory.hpp:77] Creating layer fc8-conv_fc8-conv_0_split I0911 16:01:49.967367 11431 net.cpp:94] Creating Layer fc8-conv_fc8-conv_0_split I0911 16:01:49.967370 11431 net.cpp:435] fc8-conv_fc8-conv_0_split <- fc8-conv I0911 16:01:49.967375 11431 net.cpp:409] fc8-conv_fc8-conv_0_split -> fc8-conv_fc8-conv_0_split_0 I0911 16:01:49.967381 11431 net.cpp:409] fc8-conv_fc8-conv_0_split -> fc8-conv_fc8-conv_0_split_1 I0911 16:01:49.967391 11431 net.cpp:144] Setting up fc8-conv_fc8-conv_0_split I0911 16:01:49.967396 11431 net.cpp:151] Top shape: 1 2 251 251 (126002) I0911 16:01:49.967401 11431 net.cpp:151] Top shape: 1 2 251 251 (126002) I0911 16:01:49.967402 11431 net.cpp:159] Memory required for data: 1832387736 I0911 16:01:49.967406 11431 layer_factory.hpp:77] Creating layer softmax I0911 16:01:49.967411 11431 net.cpp:94] Creating Layer softmax I0911 16:01:49.967414 11431 net.cpp:435] softmax <- fc8-conv_fc8-conv_0_split_0 I0911 16:01:49.967419 11431 net.cpp:409] softmax -> softmax I0911 16:01:49.967430 11431 net.cpp:144] Setting up softmax I0911 16:01:49.967435 11431 net.cpp:151] Top shape: 1 2 251 251 (126002) I0911 16:01:49.967437 11431 net.cpp:159] Memory required for data: 1832891744 I0911 16:01:49.967442 11431 layer_factory.hpp:77] Creating layer prob I0911 16:01:49.967447 11431 net.cpp:94] Creating Layer prob I0911 16:01:49.967449 11431 net.cpp:435] prob <- fc8-conv_fc8-conv_0_split_1 I0911 16:01:49.967453 11431 net.cpp:409] prob -> prob I0911 16:01:49.967463 11431 net.cpp:144] Setting up prob I0911 16:01:49.967466 11431 net.cpp:151] Top shape: 1 2 251 251 (126002) I0911 16:01:49.967469 11431 net.cpp:159] Memory required for data: 1833395752 I0911 16:01:49.967473 11431 net.cpp:222] prob does not need backward computation. I0911 16:01:49.967475 11431 net.cpp:222] softmax does not need backward computation. I0911 16:01:49.967478 11431 net.cpp:222] fc8-conv_fc8-conv_0_split does not need backward computation. I0911 16:01:49.967483 11431 net.cpp:222] fc8-conv does not need backward computation. I0911 16:01:49.967485 11431 net.cpp:222] relu3b does not need backward computation. I0911 16:01:49.967489 11431 net.cpp:222] bn3b does not need backward computation. I0911 16:01:49.967491 11431 net.cpp:222] conv3b does not need backward computation. I0911 16:01:49.967494 11431 net.cpp:222] relu3a does not need backward computation. I0911 16:01:49.967497 11431 net.cpp:222] bn3a does not need backward computation. I0911 16:01:49.967499 11431 net.cpp:222] conv3a does not need backward computation. I0911 16:01:49.967502 11431 net.cpp:222] relu2b does not need backward computation. I0911 16:01:49.967505 11431 net.cpp:222] bn2b does not need backward computation. I0911 16:01:49.967509 11431 net.cpp:222] conv2b does not need backward computation. I0911 16:01:49.967511 11431 net.cpp:222] relu2a does not need backward computation. I0911 16:01:49.967515 11431 net.cpp:222] bn2a does not need backward computation. I0911 16:01:49.967519 11431 net.cpp:222] conv2a does not need backward computation. I0911 16:01:49.967521 11431 net.cpp:222] relu1b does not need backward computation. I0911 16:01:49.967525 11431 net.cpp:222] bn1b does not need backward computation. I0911 16:01:49.967527 11431 net.cpp:222] conv1b does not need backward computation. I0911 16:01:49.967530 11431 net.cpp:222] relu1a does not need backward computation. I0911 16:01:49.967535 11431 net.cpp:222] bn1a does not need backward computation. I0911 16:01:49.967537 11431 net.cpp:222] conv1a does not need backward computation. I0911 16:01:49.967540 11431 net.cpp:222] input does not need backward computation. I0911 16:01:49.967543 11431 net.cpp:264] This network produces output prob I0911 16:01:49.967550 11431 net.cpp:264] This network produces output softmax I0911 16:01:49.967572 11431 net.cpp:284] Network initialization done. net.copy_from('full_convolutional_net.caffemodel') I0911 16:01:52.154938 11431 net.cpp:791] Ignoring source layer data F0911 16:01:52.154958 11431 net.cpp:804] Cannot copy param 2 weights from layer 'bn1a'; shape mismatch. Source param shape is 1 1 1 1 (1); target param shape is 1 16 1 1 (16). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer. * Check failure stack trace: * Aborted (core dumped)

This problem might be reported to nvidia caffe, but there is not active like blvc. So maybe anyone from here could provide some insights.

Is there some methods of python wrapper that could print the details about the trained model, like the shape and size of the weights. This will provide me a way to check where the size changed between two of my machine.

rtgoring commented 7 years ago

Greetings! Did you find a fix to your issue?

xxw345 commented 7 years ago

Yes, I didn't find why I fix the issue. But the way I fixed it is just copying all the training data used in one machine to another machine, and do the identical training again on the machine I want to use for large throughput. I don't know why, but it looks the problem is from the data type transformation of batch normalization layer from nvidia branch.

rtgoring commented 7 years ago

Thanks. I was wanting to avoid doing that, but looks like I don't have that option.

lin1000 commented 6 years ago

I am facing the same problem and try to find out the solution.