VisibleDeprecationWarning and train failure

CassieMai commented 7 years ago

Hello, I have a problem when I trained mnc using ./experiments/scripts/mnc_5stage.sh. Can anyone help me? Thanks in advance.

I0320 15:46:29.860514  2121 net.cpp:270] This network produces output seg_cls_loss
I0320 15:46:29.860517  2121 net.cpp:270] This network produces output seg_cls_loss_ext
I0320 15:46:29.862728  2121 net.cpp:283] Network initialization done.
I0320 15:46:29.862998  2121 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1024780411
I0320 15:46:30.187852  2121 net.cpp:810] Ignoring source layer rpn_conv/3x3
I0320 15:46:30.187872  2121 net.cpp:810] Ignoring source layer rpn_relu/3x3
I0320 15:46:30.187875  2121 net.cpp:810] Ignoring source layer rpn/output_rpn_relu/3x3_0_split
I0320 15:46:30.244598  2121 net.cpp:810] Ignoring source layer drop6
I0320 15:46:30.253931  2121 net.cpp:810] Ignoring source layer drop7
I0320 15:46:30.310539  2121 net.cpp:810] Ignoring source layer drop6_mask
I0320 15:46:30.319871  2121 net.cpp:810] Ignoring source layer drop7_mask
Solving...
/MNC/tools/../lib/pylayer/proposal_target_layer.py:152: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  cur_inds = npr.choice(cur_inds, size=cur_rois_this_image, replace=False)
/MNC/tools/../lib/transform/bbox_transform.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
/MNC/tools/../lib/transform/bbox_transform.py:202: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
/MNC/tools/../lib/pylayer/proposal_target_layer.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_box = scaled_gt_boxes[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:193: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_masks[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:194: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask_info = mask_info[gt_assignment[val]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:195: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  gt_mask = gt_mask[0:gt_mask_info[0], 0:gt_mask_info[1]]
/MNC/tools/../lib/pylayer/proposal_target_layer.py:201: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  top_mask_info[i, 0] = gt_assignment[val]
F0320 15:46:43.415727  2121 smooth_L1_loss_layer.cpp:54] Not Implemented Yet
*** Check failure stack trace: ***
./experiments/scripts/mnc_5stage.sh: line 35:  2121 Aborted                 (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}

hgaiser commented 7 years ago

The warning is not the issue, but it looks like you are running CPU mode (which is, as it mentions, not implemented).

CassieMai commented 7 years ago

Thanks. Actually I am using GPU mode. Maybe there is some wrong configuration in debugging process. I might debug MNC from the beginning.

CassieMai commented 7 years ago

@hgaiser I still can't solve this problem. I really use GPU mode (in Makefile.config, #CPU_only=1). Do you have any idea?

hgaiser commented 7 years ago

You can try a basic caffe tutorial and make sure it is running on the GPU. Does the command nvidia-smi give something logical? Or does it print some error or something?

CassieMai commented 7 years ago

@hgaiser I did training on a new downloaded MNC, and the problem became as follows.


I0321 16:31:14.548213 21241 net.cpp:270] This network produces output rpn_loss_bbox
I0321 16:31:14.548214 21241 net.cpp:270] This network produces output seg_cls_loss
I0321 16:31:14.548216 21241 net.cpp:270] This network produces output seg_cls_loss_ext
I0321 16:31:14.631436 21241 net.cpp:283] Network initialization done.
I0321 16:31:14.631700 21241 solver.cpp:60] Solver scaffolding done.
Loading pretrained model weights from data/imagenet_models/VGG16.mask.caffemodel
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 1024780411
I0321 16:31:14.956207 21241 net.cpp:810] Ignoring source layer rpn_conv/3x3
I0321 16:31:14.956226 21241 net.cpp:810] Ignoring source layer rpn_relu/3x3
I0321 16:31:14.956228 21241 net.cpp:810] Ignoring source layer rpn/output_rpn_relu/3x3_0_split
I0321 16:31:15.013999 21241 net.cpp:810] Ignoring source layer drop6
I0321 16:31:15.023555 21241 net.cpp:810] Ignoring source layer drop7
I0321 16:31:15.081140 21241 net.cpp:810] Ignoring source layer drop6_mask
I0321 16:31:15.090597 21241 net.cpp:810] Ignoring source layer drop7_mask
Solving...
./experiments/scripts/mnc_5stage.sh: line 35: 21241 Segmentation fault      (core dumped) ./tools/train_net.py --gpu ${GPU_ID} --solver models/${NET}/mnc_5stage/solver.prototxt --weights ${NET_INIT} --imdb ${DATASET_TRAIN} --iters ${ITERS} --cfg experiments/cfgs/${NET}/mnc_5stage.yml ${EXTRA_ARGS}

CassieMai commented 7 years ago

@hgaiser Sorry. It seems that I did't use cuDNN correctly. I check cuda path in ~/.bashrc. Now this problem has been solved. Thank you for your help.

daijifeng001 / MNC

VisibleDeprecationWarning and train failure #48