Loss diverges in a few iterations

BVLC / caffe

Caffe: a fast open framework for deep learning.

Other

34.04k stars 18.7k forks source link

Anybody who can explain this.Am I doing wrong with my net design or ...? In the later,the loss function just goes to NAN. I think the reason is the problem describe in the title which causes the result.

I1015 10:46:48.905999 11365 caffe.cpp:99] Use GPU with device ID 0 I1015 10:46:51.345105 11365 caffe.cpp:107] Starting Optimization I1015 10:46:51.345240 11365 solver.cpp:32] Initializing solver from parameters: test_iter: 1000 test_interval: 1000 base_lr: 0.01 display: 200 max_iter: 450000 lr_policy: "step" gamma: 0.1 momentum: 0.9 weight_decay: 0.0005 stepsize: 10000 snapshot: 10000 snapshot_prefix: "icdar_train" solver_mode: GPU net: "train_val.prototxt" I1015 10:46:51.345363 11365 solver.cpp:67] Creating training net from net file: train_val.prototxt I1015 10:46:51.345728 11365 net.cpp:275] The NetState phase (0) differed from the phase (1) specified by a rule in layer data I1015 10:46:51.345762 11365 net.cpp:275] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy I1015 10:46:51.345878 11365 net.cpp:39] Initializing net from parameters: name: "Net" layers { top: "data" top: "label" name: "data" type: DATA data_param { source: "levelDB/icdar_train_lmdb" batch_size: 256 backend: LMDB } include { phase: TRAIN } } layers { bottom: "data" top: "conv1" name: "conv1" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 48 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv1" top: "conv2" name: "conv2" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv2" top: "conv2" name: "drop2" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "conv2" top: "convCaseInsensive" name: "convCaseInsensive" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 kernel_size: 8 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensive" top: "convCaseInsensive" name: "drop3" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "convCaseInsensive" top: "convCaseInsensiveSecond" name: "convCaseInsensiveSecond" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 36 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensiveSecond" bottom: "label" top: "loss" name: "loss" type: SOFTMAX_LOSS } state { phase: TRAIN } I1015 10:46:51.346560 11365 net.cpp:67] Creating Layer data I1015 10:46:51.346585 11365 net.cpp:356] data -> data I1015 10:46:51.346613 11365 net.cpp:356] data -> label I1015 10:46:51.346633 11365 net.cpp:96] Setting up data I1015 10:46:51.346757 11365 data_layer.cpp:68] Opening lmdb levelDB/icdar_train_lmdb I1015 10:46:51.346807 11365 data_layer.cpp:128] output data size: 256,3,24,24 I1015 10:46:51.347970 11365 net.cpp:103] Top shape: 256 3 24 24 (442368) I1015 10:46:51.347993 11365 net.cpp:103] Top shape: 256 1 1 1 (256) I1015 10:46:51.348012 11365 net.cpp:67] Creating Layer conv1 I1015 10:46:51.348026 11365 net.cpp:394] conv1 <- data I1015 10:46:51.348058 11365 net.cpp:356] conv1 -> conv1 I1015 10:46:51.348088 11365 net.cpp:96] Setting up conv1 I1015 10:46:51.349117 11365 net.cpp:103] Top shape: 256 48 16 16 (3145728) I1015 10:46:51.349169 11365 net.cpp:67] Creating Layer conv2 I1015 10:46:51.349187 11365 net.cpp:394] conv2 <- conv1 I1015 10:46:51.349207 11365 net.cpp:356] conv2 -> conv2 I1015 10:46:51.349225 11365 net.cpp:96] Setting up conv2 I1015 10:46:51.360640 11365 net.cpp:103] Top shape: 256 64 8 8 (1048576) I1015 10:46:51.360690 11365 net.cpp:67] Creating Layer drop2 I1015 10:46:51.360738 11365 net.cpp:394] drop2 <- conv2 I1015 10:46:51.360757 11365 net.cpp:345] drop2 -> conv2 (in-place) I1015 10:46:51.360774 11365 net.cpp:96] Setting up drop2 I1015 10:46:51.360792 11365 net.cpp:103] Top shape: 256 64 8 8 (1048576) I1015 10:46:51.360810 11365 net.cpp:67] Creating Layer convCaseInsensive I1015 10:46:51.360823 11365 net.cpp:394] convCaseInsensive <- conv2 I1015 10:46:51.360875 11365 net.cpp:356] convCaseInsensive -> convCaseInsensive I1015 10:46:51.360894 11365 net.cpp:96] Setting up convCaseInsensive I1015 10:46:51.385164 11365 net.cpp:103] Top shape: 256 128 1 1 (32768) I1015 10:46:51.385223 11365 net.cpp:67] Creating Layer drop3 I1015 10:46:51.385239 11365 net.cpp:394] drop3 <- convCaseInsensive I1015 10:46:51.385260 11365 net.cpp:345] drop3 -> convCaseInsensive (in-place) I1015 10:46:51.385279 11365 net.cpp:96] Setting up drop3 I1015 10:46:51.385294 11365 net.cpp:103] Top shape: 256 128 1 1 (32768) I1015 10:46:51.385313 11365 net.cpp:67] Creating Layer convCaseInsensiveSecond I1015 10:46:51.385325 11365 net.cpp:394] convCaseInsensiveSecond <- convCaseInsensive I1015 10:46:51.385347 11365 net.cpp:356] convCaseInsensiveSecond -> convCaseInsensiveSecond I1015 10:46:51.385365 11365 net.cpp:96] Setting up convCaseInsensiveSecond I1015 10:46:51.385601 11365 net.cpp:103] Top shape: 256 36 1 1 (9216) I1015 10:46:51.385629 11365 net.cpp:67] Creating Layer loss I1015 10:46:51.385644 11365 net.cpp:394] loss <- convCaseInsensiveSecond I1015 10:46:51.385658 11365 net.cpp:394] loss <- label I1015 10:46:51.385673 11365 net.cpp:356] loss -> loss I1015 10:46:51.385689 11365 net.cpp:96] Setting up loss I1015 10:46:51.385710 11365 net.cpp:103] Top shape: 1 1 1 1 (1) I1015 10:46:51.385725 11365 net.cpp:109] with loss weight 1 I1015 10:46:51.385774 11365 net.cpp:170] loss needs backward computation. I1015 10:46:51.385792 11365 net.cpp:170] convCaseInsensiveSecond needs backward computation. I1015 10:46:51.385807 11365 net.cpp:170] drop3 needs backward computation. I1015 10:46:51.385819 11365 net.cpp:170] convCaseInsensive needs backward computation. I1015 10:46:51.385838 11365 net.cpp:170] drop2 needs backward computation. I1015 10:46:51.385851 11365 net.cpp:170] conv2 needs backward computation. I1015 10:46:51.385866 11365 net.cpp:170] conv1 needs backward computation. I1015 10:46:51.385884 11365 net.cpp:172] data does not need backward computation. I1015 10:46:51.385900 11365 net.cpp:208] This network produces output loss I1015 10:46:51.385922 11365 net.cpp:467] Collecting Learning Rate and Weight Decay. I1015 10:46:51.385941 11365 net.cpp:219] Network initialization done. I1015 10:46:51.385956 11365 net.cpp:220] Memory required for data: 23041028 I1015 10:46:51.386291 11365 solver.cpp:151] Creating test net (#0) specified by net file: train_val.prototxt I1015 10:46:51.386332 11365 net.cpp:275] The NetState phase (1) differed from the phase (0) specified by a rule in layer data I1015 10:46:51.386464 11365 net.cpp:39] Initializing net from parameters: name: "Net" layers { top: "data" top: "label" name: "data" type: DATA data_param { source: "levelDB/icdar_test_lmdb" batch_size: 50 backend: LMDB } include { phase: TEST } } layers { bottom: "data" top: "conv1" name: "conv1" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 48 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv1" top: "conv2" name: "conv2" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv2" top: "conv2" name: "drop2" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "conv2" top: "convCaseInsensive" name: "convCaseInsensive" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 kernel_size: 8 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensive" top: "convCaseInsensive" name: "drop3" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "convCaseInsensive" top: "convCaseInsensiveSecond" name: "convCaseInsensiveSecond" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 36 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensiveSecond" bottom: "label" top: "accuracy" name: "accuracy" type: ACCURACY include { phase: TEST } } layers { bottom: "convCaseInsensiveSecond" bottom: "label" top: "loss" name: "loss" type: SOFTMAX_LOSS } state { phase: TEST } I1015 10:46:51.387393 11365 net.cpp:67] Creating Layer data I1015 10:46:51.387423 11365 net.cpp:356] data -> data I1015 10:46:51.387452 11365 net.cpp:356] data -> label I1015 10:46:51.387480 11365 net.cpp:96] Setting up data I1015 10:46:51.387570 11365 data_layer.cpp:68] Opening lmdb levelDB/icdar_test_lmdb I1015 10:46:51.387614 11365 data_layer.cpp:128] output data size: 50,3,24,24 I1015 10:46:51.387886 11365 net.cpp:103] Top shape: 50 3 24 24 (86400) I1015 10:46:51.387913 11365 net.cpp:103] Top shape: 50 1 1 1 (50) I1015 10:46:51.387943 11365 net.cpp:67] Creating Layer label_data_1_split I1015 10:46:51.387969 11365 net.cpp:394] label_data_1_split <- label I1015 10:46:51.387991 11365 net.cpp:356] label_data_1_split -> label_data_1_split_0 I1015 10:46:51.388023 11365 net.cpp:356] label_data_1_split -> label_data_1_split_1 I1015 10:46:51.388047 11365 net.cpp:96] Setting up label_data_1_split I1015 10:46:51.388072 11365 net.cpp:103] Top shape: 50 1 1 1 (50) I1015 10:46:51.388092 11365 net.cpp:103] Top shape: 50 1 1 1 (50) I1015 10:46:51.388118 11365 net.cpp:67] Creating Layer conv1 I1015 10:46:51.388139 11365 net.cpp:394] conv1 <- data I1015 10:46:51.388165 11365 net.cpp:356] conv1 -> conv1 I1015 10:46:51.388192 11365 net.cpp:96] Setting up conv1 I1015 10:46:51.389050 11365 net.cpp:103] Top shape: 50 48 16 16 (614400) I1015 10:46:51.389094 11365 net.cpp:67] Creating Layer conv2 I1015 10:46:51.389120 11365 net.cpp:394] conv2 <- conv1 I1015 10:46:51.389147 11365 net.cpp:356] conv2 -> conv2 I1015 10:46:51.389174 11365 net.cpp:96] Setting up conv2 I1015 10:46:51.401437 11365 net.cpp:103] Top shape: 50 64 8 8 (204800) I1015 10:46:51.401482 11365 net.cpp:67] Creating Layer drop2 I1015 10:46:51.401499 11365 net.cpp:394] drop2 <- conv2 I1015 10:46:51.401515 11365 net.cpp:345] drop2 -> conv2 (in-place) I1015 10:46:51.401531 11365 net.cpp:96] Setting up drop2 I1015 10:46:51.401546 11365 net.cpp:103] Top shape: 50 64 8 8 (204800) I1015 10:46:51.401568 11365 net.cpp:67] Creating Layer convCaseInsensive I1015 10:46:51.401582 11365 net.cpp:394] convCaseInsensive <- conv2 I1015 10:46:51.401598 11365 net.cpp:356] convCaseInsensive -> convCaseInsensive I1015 10:46:51.401615 11365 net.cpp:96] Setting up convCaseInsensive I1015 10:46:51.425907 11365 net.cpp:103] Top shape: 50 128 1 1 (6400) I1015 10:46:51.425969 11365 net.cpp:67] Creating Layer drop3 I1015 10:46:51.425987 11365 net.cpp:394] drop3 <- convCaseInsensive I1015 10:46:51.426007 11365 net.cpp:345] drop3 -> convCaseInsensive (in-place) I1015 10:46:51.426025 11365 net.cpp:96] Setting up drop3 I1015 10:46:51.426041 11365 net.cpp:103] Top shape: 50 128 1 1 (6400) I1015 10:46:51.426059 11365 net.cpp:67] Creating Layer convCaseInsensiveSecond I1015 10:46:51.426090 11365 net.cpp:394] convCaseInsensiveSecond <- convCaseInsensive I1015 10:46:51.426108 11365 net.cpp:356] convCaseInsensiveSecond -> convCaseInsensiveSecond I1015 10:46:51.426139 11365 net.cpp:96] Setting up convCaseInsensiveSecond I1015 10:46:51.426408 11365 net.cpp:103] Top shape: 50 36 1 1 (1800) I1015 10:46:51.426435 11365 net.cpp:67] Creating Layer convCaseInsensiveSecond_convCaseInsensiveSecond_0_split I1015 10:46:51.426450 11365 net.cpp:394] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split <- convCaseInsensiveSecond I1015 10:46:51.426468 11365 net.cpp:356] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split -> convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_0 I1015 10:46:51.426491 11365 net.cpp:356] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split -> convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_1 I1015 10:46:51.426508 11365 net.cpp:96] Setting up convCaseInsensiveSecond_convCaseInsensiveSecond_0_split I1015 10:46:51.426527 11365 net.cpp:103] Top shape: 50 36 1 1 (1800) I1015 10:46:51.426540 11365 net.cpp:103] Top shape: 50 36 1 1 (1800) I1015 10:46:51.426558 11365 net.cpp:67] Creating Layer accuracy I1015 10:46:51.426573 11365 net.cpp:394] accuracy <- convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_0 I1015 10:46:51.426591 11365 net.cpp:394] accuracy <- label_data_1_split_0 I1015 10:46:51.426607 11365 net.cpp:356] accuracy -> accuracy I1015 10:46:51.426623 11365 net.cpp:96] Setting up accuracy I1015 10:46:51.426638 11365 net.cpp:103] Top shape: 1 1 1 1 (1) I1015 10:46:51.426659 11365 net.cpp:67] Creating Layer loss I1015 10:46:51.426676 11365 net.cpp:394] loss <- convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_1 I1015 10:46:51.426692 11365 net.cpp:394] loss <- label_data_1_split_1 I1015 10:46:51.426710 11365 net.cpp:356] loss -> loss I1015 10:46:51.426726 11365 net.cpp:96] Setting up loss I1015 10:46:51.426743 11365 net.cpp:103] Top shape: 1 1 1 1 (1) I1015 10:46:51.426756 11365 net.cpp:109] with loss weight 1 I1015 10:46:51.426777 11365 net.cpp:170] loss needs backward computation. I1015 10:46:51.426790 11365 net.cpp:172] accuracy does not need backward computation. I1015 10:46:51.426802 11365 net.cpp:170] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split needs backward computation. I1015 10:46:51.426815 11365 net.cpp:170] convCaseInsensiveSecond needs backward computation. I1015 10:46:51.426827 11365 net.cpp:170] drop3 needs backward computation. I1015 10:46:51.426839 11365 net.cpp:170] convCaseInsensive needs backward computation. I1015 10:46:51.426854 11365 net.cpp:170] drop2 needs backward computation. I1015 10:46:51.426867 11365 net.cpp:170] conv2 needs backward computation. I1015 10:46:51.426882 11365 net.cpp:170] conv1 needs backward computation. I1015 10:46:51.426934 11365 net.cpp:172] label_data_1_split does not need backward computation. I1015 10:46:51.426946 11365 net.cpp:172] data does not need backward computation. I1015 10:46:51.426962 11365 net.cpp:208] This network produces output accuracy I1015 10:46:51.426980 11365 net.cpp:208] This network produces output loss I1015 10:46:51.427003 11365 net.cpp:467] Collecting Learning Rate and Weight Decay. I1015 10:46:51.427023 11365 net.cpp:219] Network initialization done. I1015 10:46:51.427041 11365 net.cpp:220] Memory required for data: 4515008 I1015 10:46:51.427086 11365 solver.cpp:41] Solver scaffolding done. I1015 10:46:51.427126 11365 solver.cpp:160] Solving Net I1015 10:46:51.427165 11365 solver.cpp:247] Iteration 0, Testing net (#0) I1015 10:47:19.943495 11365 solver.cpp:298] Test net output #0: accuracy = 0.0273202 I1015 10:47:19.943835 11365 solver.cpp:298] Test net output #1: loss = 3.9109 (* 1 = 3.9109 loss) I1015 10:47:20.212116 11365 solver.cpp:191] Iteration 0, loss = 5.27274 I1015 10:47:20.212206 11365 solver.cpp:206] Train net output #0: loss = 5.27274 (* 1 = 5.27274 loss) I1015 10:47:20.212280 11365 solver.cpp:403] Iteration 0, lr = 0.01 I1015 10:48:19.794435 11365 solver.cpp:191] Iteration 200, loss = nan I1015 10:48:19.796959 11365 solver.cpp:206] Train net output #0: loss = nan (* 1 = nan loss) I1015 10:48:19.796989 11365 solver.cpp:403] Iteration 200, lr = 0.01

Just try a smaller base_lr

On Tuesday, October 14, 2014, He Pan notifications@github.com wrote:

Is there anybody who can explain this?Am I doing wrong with my net design or ...?

I1015 10:46:48.905999 11365 caffe.cpp:99] Use GPU with device ID 0 I1015 10:46:51.345105 11365 caffe.cpp:107] Starting Optimization I1015 10:46:51.345240 11365 solver.cpp:32] Initializing solver from parameters: test_iter: 1000 test_interval: 1000 base_lr: 0.01 display: 200 max_iter: 450000 lr_policy: "step" gamma: 0.1 momentum: 0.9 weight_decay: 0.0005 stepsize: 10000 snapshot: 10000 snapshot_prefix: "icdar_train" solver_mode: GPU net: "train_val.prototxt" I1015 10:46:51.345363 11365 solver.cpp:67] Creating training net from net file: train_val.prototxt I1015 10:46:51.345728 11365 net.cpp:275] The NetState phase (0) differed from the phase (1) specified by a rule in layer data I1015 10:46:51.345762 11365 net.cpp:275] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy I1015 10:46:51.345878 11365 net.cpp:39] Initializing net from parameters: name: "Net" layers { top: "data" top: "label" name: "data" type: DATA data_param { source: "levelDB/icdar_train_lmdb" batch_size: 256 backend: LMDB } include { phase: TRAIN } } layers { bottom: "data" top: "conv1" name: "conv1" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 48 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv1" top: "conv2" name: "conv2" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv2" top: "conv2" name: "drop2" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "conv2" top: "convCaseInsensive" name: "convCaseInsensive" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 kernel_size: 8 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensive" top: "convCaseInsensive" name: "drop3" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "convCaseInsensive" top: "convCaseInsensiveSecond" name: "convCaseInsensiveSecond" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 36 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensiveSecond" bottom: "label" top: "loss" name: "loss" type: SOFTMAX_LOSS } state { phase: TRAIN } I1015 10:46:51.346560 11365 net.cpp:67] Creating Layer data I1015 10:46:51.346585 11365 net.cpp:356] data -> data I1015 10:46:51.346613 11365 net.cpp:356] data -> label I1015 10:46:51.346633 11365 net.cpp:96] Setting up data I1015 10:46:51.346757 11365 data_layer.cpp:68] Opening lmdb levelDB/icdar_train_lmdb I1015 10:46:51.346807 11365 data_layer.cpp:128] output data size: 256,3,24,24 I1015 10:46:51.347970 11365 net.cpp:103] Top shape: 256 3 24 24 (442368) I1015 10:46:51.347993 11365 net.cpp:103] Top shape: 256 1 1 1 (256) I1015 10:46:51.348012 11365 net.cpp:67] Creating Layer conv1 I1015 10:46:51.348026 11365 net.cpp:394] conv1 <- data I1015 10:46:51.348058 11365 net.cpp:356] conv1 -> conv1 I1015 10:46:51.348088 11365 net.cpp:96] Setting up conv1 I1015 10:46:51.349117 11365 net.cpp:103] Top shape: 256 48 16 16 (3145728) I1015 10:46:51.349169 11365 net.cpp:67] Creating Layer conv2 I1015 10:46:51.349187 11365 net.cpp:394] conv2 <- conv1 I1015 10:46:51.349207 11365 net.cpp:356] conv2 -> conv2 I1015 10:46:51.349225 11365 net.cpp:96] Setting up conv2 I1015 10:46:51.360640 11365 net.cpp:103] Top shape: 256 64 8 8 (1048576) I1015 10:46:51.360690 11365 net.cpp:67] Creating Layer drop2 I1015 10:46:51.360738 11365 net.cpp:394] drop2 <- conv2 I1015 10:46:51.360757 11365 net.cpp:345] drop2 -> conv2 (in-place) I1015 10:46:51.360774 11365 net.cpp:96] Setting up drop2 I1015 10:46:51.360792 11365 net.cpp:103] Top shape: 256 64 8 8 (1048576) I1015 10:46:51.360810 11365 net.cpp:67] Creating Layer convCaseInsensive I1015 10:46:51.360823 11365 net.cpp:394] convCaseInsensive <- conv2 I1015 10:46:51.360875 11365 net.cpp:356] convCaseInsensive -> convCaseInsensive I1015 10:46:51.360894 11365 net.cpp:96] Setting up convCaseInsensive I1015 10:46:51.385164 11365 net.cpp:103] Top shape: 256 128 1 1 (32768) I1015 10:46:51.385223 11365 net.cpp:67] Creating Layer drop3 I1015 10:46:51.385239 11365 net.cpp:394] drop3 <- convCaseInsensive I1015 10:46:51.385260 11365 net.cpp:345] drop3 -> convCaseInsensive (in-place) I1015 10:46:51.385279 11365 net.cpp:96] Setting up drop3 I1015 10:46:51.385294 11365 net.cpp:103] Top shape: 256 128 1 1 (32768) I1015 10:46:51.385313 11365 net.cpp:67] Creating Layer convCaseInsensiveSecond I1015 10:46:51.385325 11365 net.cpp:394] convCaseInsensiveSecond <- convCaseInsensive I1015 10:46:51.385347 11365 net.cpp:356] convCaseInsensiveSecond -> convCaseInsensiveSecond I1015 10:46:51.385365 11365 net.cpp:96] Setting up convCaseInsensiveSecond I1015 10:46:51.385601 11365 net.cpp:103] Top shape: 256 36 1 1 (9216) I1015 10:46:51.385629 11365 net.cpp:67] Creating Layer loss I1015 10:46:51.385644 11365 net.cpp:394] loss <- convCaseInsensiveSecond I1015 10:46:51.385658 11365 net.cpp:394] loss <- label I1015 10:46:51.385673 11365 net.cpp:356] loss -> loss I1015 10:46:51.385689 11365 net.cpp:96] Setting up loss I1015 10:46:51.385710 11365 net.cpp:103] Top shape: 1 1 1 1 (1) I1015 10:46:51.385725 11365 net.cpp:109] with loss weight 1 I1015 10:46:51.385774 11365 net.cpp:170] loss needs backward computation. I1015 10:46:51.385792 11365 net.cpp:170] convCaseInsensiveSecond needs backward computation. I1015 10:46:51.385807 11365 net.cpp:170] drop3 needs backward computation. I1015 10:46:51.385819 11365 net.cpp:170] convCaseInsensive needs backward computation. I1015 10:46:51.385838 11365 net.cpp:170] drop2 needs backward computation. I1015 10:46:51.385851 11365 net.cpp:170] conv2 needs backward computation. I1015 10:46:51.385866 11365 net.cpp:170] conv1 needs backward computation. I1015 10:46:51.385884 11365 net.cpp:172] data does not need backward computation. I1015 10:46:51.385900 11365 net.cpp:208] This network produces output loss I1015 10:46:51.385922 11365 net.cpp:467] Collecting Learning Rate and Weight Decay. I1015 10:46:51.385941 11365 net.cpp:219] Network initialization done. I1015 10:46:51.385956 11365 net.cpp:220] Memory required for data: 23041028 I1015 10:46:51.386291 11365 solver.cpp:151] Creating test net (#0) specified by net file: train_val.prototxt I1015 10:46:51.386332 11365 net.cpp:275] The NetState phase (1) differed from the phase (0) specified by a rule in layer data I1015 10:46:51.386464 11365 net.cpp:39] Initializing net from parameters: name: "Net" layers { top: "data" top: "label" name: "data" type: DATA data_param { source: "levelDB/icdar_test_lmdb" batch_size: 50 backend: LMDB } include { phase: TEST } } layers { bottom: "data" top: "conv1" name: "conv1" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 48 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv1" top: "conv2" name: "conv2" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 64 kernel_size: 9 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "conv2" top: "conv2" name: "drop2" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "conv2" top: "convCaseInsensive" name: "convCaseInsensive" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 128 kernel_size: 8 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensive" top: "convCaseInsensive" name: "drop3" type: DROPOUT dropout_param { dropout_ratio: 0.5 } } layers { bottom: "convCaseInsensive" top: "convCaseInsensiveSecond" name: "convCaseInsensiveSecond" type: CONVOLUTION blobs_lr: 1 blobs_lr: 2 weight_decay: 1 weight_decay: 0 convolution_param { num_output: 36 kernel_size: 1 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layers { bottom: "convCaseInsensiveSecond" bottom: "label" top: "accuracy" name: "accuracy" type: ACCURACY include { phase: TEST } } layers { bottom: "convCaseInsensiveSecond" bottom: "label" top: "loss" name: "loss" type: SOFTMAX_LOSS } state { phase: TEST } I1015 10:46:51.387393 11365 net.cpp:67] Creating Layer data I1015 10:46:51.387423 11365 net.cpp:356] data -> data I1015 10:46:51.387452 11365 net.cpp:356] data -> label I1015 10:46:51.387480 11365 net.cpp:96] Setting up data I1015 10:46:51.387570 11365 data_layer.cpp:68] Opening lmdb levelDB/icdar_test_lmdb I1015 10:46:51.387614 11365 data_layer.cpp:128] output data size: 50,3,24,24 I1015 10:46:51.387886 11365 net.cpp:103] Top shape: 50 3 24 24 (86400) I1015 10:46:51.387913 11365 net.cpp:103] Top shape: 50 1 1 1 (50) I1015 10:46:51.387943 11365 net.cpp:67] Creating Layer label_data_1_split I1015 10:46:51.387969 11365 net.cpp:394] label_data_1_split <- label I1015 10:46:51.387991 11365 net.cpp:356] label_data_1_split -> label_data_1_split_0 I1015 10:46:51.388023 11365 net.cpp:356] label_data_1_split -> label_data_1_split_1 I1015 10:46:51.388047 11365 net.cpp:96] Setting up label_data_1_split I1015 10:46:51.388072 11365 net.cpp:103] Top shape: 50 1 1 1 (50) I1015 10:46:51.388092 11365 net.cpp:103] Top shape: 50 1 1 1 (50) I1015 10:46:51.388118 11365 net.cpp:67] Creating Layer conv1 I1015 10:46:51.388139 11365 net.cpp:394] conv1 <- data I1015 10:46:51.388165 11365 net.cpp:356] conv1 -> conv1 I1015 10:46:51.388192 11365 net.cpp:96] Setting up conv1 I1015 10:46:51.389050 11365 net.cpp:103] Top shape: 50 48 16 16 (614400) I1015 10:46:51.389094 11365 net.cpp:67] Creating Layer conv2 I1015 10:46:51.389120 11365 net.cpp:394] conv2 <- conv1 I1015 10:46:51.389147 11365 net.cpp:356] conv2 -> conv2 I1015 10:46:51.389174 11365 net.cpp:96] Setting up conv2 I1015 10:46:51.401437 11365 net.cpp:103] Top shape: 50 64 8 8 (204800) I1015 10:46:51.401482 11365 net.cpp:67] Creating Layer drop2 I1015 10:46:51.401499 11365 net.cpp:394] drop2 <- conv2 I1015 10:46:51.401515 11365 net.cpp:345] drop2 -> conv2 (in-place) I1015 10:46:51.401531 11365 net.cpp:96] Setting up drop2 I1015 10:46:51.401546 11365 net.cpp:103] Top shape: 50 64 8 8 (204800) I1015 10:46:51.401568 11365 net.cpp:67] Creating Layer convCaseInsensive I1015 10:46:51.401582 11365 net.cpp:394] convCaseInsensive <- conv2 I1015 10:46:51.401598 11365 net.cpp:356] convCaseInsensive -> convCaseInsensive I1015 10:46:51.401615 11365 net.cpp:96] Setting up convCaseInsensive I1015 10:46:51.425907 11365 net.cpp:103] Top shape: 50 128 1 1 (6400) I1015 10:46:51.425969 11365 net.cpp:67] Creating Layer drop3 I1015 10:46:51.425987 11365 net.cpp:394] drop3 <- convCaseInsensive I1015 10:46:51.426007 11365 net.cpp:345] drop3 -> convCaseInsensive (in-place) I1015 10:46:51.426025 11365 net.cpp:96] Setting up drop3 I1015 10:46:51.426041 11365 net.cpp:103] Top shape: 50 128 1 1 (6400) I1015 10:46:51.426059 11365 net.cpp:67] Creating Layer convCaseInsensiveSecond I1015 10:46:51.426090 11365 net.cpp:394] convCaseInsensiveSecond <- convCaseInsensive I1015 10:46:51.426108 11365 net.cpp:356] convCaseInsensiveSecond -> convCaseInsensiveSecond I1015 10:46:51.426139 11365 net.cpp:96] Setting up convCaseInsensiveSecond I1015 10:46:51.426408 11365 net.cpp:103] Top shape: 50 36 1 1 (1800) I1015 10:46:51.426435 11365 net.cpp:67] Creating Layer convCaseInsensiveSecond_convCaseInsensiveSecond_0_split I1015 10:46:51.426450 11365 net.cpp:394] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split <- convCaseInsensiveSecond I1015 10:46:51.426468 11365 net.cpp:356] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split -> convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_0 I1015 10:46:51.426491 11365 net.cpp:356] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split -> convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_1 I1015 10:46:51.426508 11365 net.cpp:96] Setting up convCaseInsensiveSecond_convCaseInsensiveSecond_0_split I1015 10:46:51.426527 11365 net.cpp:103] Top shape: 50 36 1 1 (1800) I1015 10:46:51.426540 11365 net.cpp:103] Top shape: 50 36 1 1 (1800) I1015 10:46:51.426558 11365 net.cpp:67] Creating Layer accuracy I1015 10:46:51.426573 11365 net.cpp:394] accuracy <- convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_0 I1015 10:46:51.426591 11365 net.cpp:394] accuracy <- label_data_1_split_0 I1015 10:46:51.426607 11365 net.cpp:356] accuracy -> accuracy I1015 10:46:51.426623 11365 net.cpp:96] Setting up accuracy I1015 10:46:51.426638 11365 net.cpp:103] Top shape: 1 1 1 1 (1) I1015 10:46:51.426659 11365 net.cpp:67] Creating Layer loss I1015 10:46:51.426676 11365 net.cpp:394] loss <- convCaseInsensiveSecond_convCaseInsensiveSecond_0_split_1 I1015 10:46:51.426692 11365 net.cpp:394] loss <- label_data_1_split_1 I1015 10:46:51.426710 11365 net.cpp:356] loss -> loss I1015 10:46:51.426726 11365 net.cpp:96] Setting up loss I1015 10:46:51.426743 11365 net.cpp:103] Top shape: 1 1 1 1 (1) I1015 10:46:51.426756 11365 net.cpp:109] with loss weight 1 I1015 10:46:51.426777 11365 net.cpp:170] loss needs backward computation. I1015 10:46:51.426790 11365 net.cpp:172] accuracy does not need backward computation. I1015 10:46:51.426802 11365 net.cpp:170] convCaseInsensiveSecond_convCaseInsensiveSecond_0_split needs backward computation. I1015 10:46:51.426815 11365 net.cpp:170] convCaseInsensiveSecond needs backward computation. I1015 10:46:51.426827 11365 net.cpp:170] drop3 needs backward computation. I1015 10:46:51.426839 11365 net.cpp:170] convCaseInsensive needs backward computation. I1015 10:46:51.426854 11365 net.cpp:170] drop2 needs backward computation. I1015 10:46:51.426867 11365 net.cpp:170] conv2 needs backward computation. I1015 10:46:51.426882 11365 net.cpp:170] conv1 needs backward computation. I1015 10:46:51.426934 11365 net.cpp:172] label_data_1_split does not need backward computation. I1015 10:46:51.426946 11365 net.cpp:172] data does not need backward computation. I1015 10:46:51.426962 11365 net.cpp:208] This network produces output accuracy I1015 10:46:51.426980 11365 net.cpp:208] This network produces output loss I1015 10:46:51.427003 11365 net.cpp:467] Collecting Learning Rate and Weight Decay. I1015 10:46:51.427023 11365 net.cpp:219] Network initialization done. I1015 10:46:51.427041 11365 net.cpp:220] Memory required for data: 4515008 I1015 10:46:51.427086 11365 solver.cpp:41] Solver scaffolding done. I1015 10:46:51.427126 11365 solver.cpp:160] Solving Net I1015 10:46:51.427165 11365 solver.cpp:247] Iteration 0, Testing net (#0) I1015 10:47:19.943495 11365 solver.cpp:298] Test net output #0: accuracy = 0.0273202 I1015 10:47:19.943835 11365 solver.cpp:298] Test net output #1 https://github.com/BVLC/caffe/issues/1: loss = 3.9109 (* 1 = 3.9109 loss) I1015 10:47:20.212116 11365 solver.cpp:191] Iteration 0, loss = 5.27274 I1015 10:47:20.212206 11365 solver.cpp:206] Train net output #0: loss = 5.27274 (* 1 = 5.27274 loss) I1015 10:47:20.212280 11365 solver.cpp:403] Iteration 0, lr = 0.01 I1015 10:48:19.794435 11365 solver.cpp:191] Iteration 200, loss = nan I1015 10:48:19.796959 11365 solver.cpp:206] Train net output #0: loss = nan (* 1 = nan loss) I1015 10:48:19.796989 11365 solver.cpp:403] Iteration 200, lr = 0.01

— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/1282.

Sergio

BVLC / caffe

Loss diverges in a few iterations #1282