facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.22k stars 5.45k forks source link

Train with different backbone #362

Open stanciuflorina opened 6 years ago

stanciuflorina commented 6 years ago

Hi i'm trying to train the keypoint RCNN with a different backbone: a MobileNet and i used this with FPN. The problem is that when i run the train script i'm stuck with this problem regarding "SpatialNarrowAs"

Actual results

I0412 13:15:14.438279 7216 context_gpu.cu:321] GPU 0: 918 MB I0412 13:15:14.438313 7216 context_gpu.cu:325] Total: 918 MB I0412 13:15:14.724922 7225 context_gpu.cu:321] GPU 0: 1046 MB I0412 13:15:14.724937 7225 context_gpu.cu:325] Total: 1046 MB I0412 13:15:14.732327 7225 context_gpu.cu:321] GPU 0: 1246 MB I0412 13:15:14.732340 7225 context_gpu.cu:325] Total: 1246 MB I0412 13:15:14.735240 7225 context_gpu.cu:321] GPU 0: 1406 MB I0412 13:15:14.735250 7225 context_gpu.cu:325] Total: 1406 MB I0412 13:15:14.736987 7227 context_gpu.cu:321] GPU 0: 1746 MB I0412 13:15:14.737004 7227 context_gpu.cu:325] Total: 1746 MB I0412 13:15:14.745851 7225 context_gpu.cu:321] GPU 0: 1906 MB I0412 13:15:14.745859 7225 context_gpu.cu:325] Total: 1906 MB I0412 13:15:14.755209 7225 context_gpu.cu:321] GPU 0: 2066 MB I0412 13:15:14.755223 7225 context_gpu.cu:325] Total: 2066 MB I0412 13:15:14.757848 7228 context_gpu.cu:321] GPU 0: 2236 MB I0412 13:15:14.757856 7228 context_gpu.cu:325] Total: 2236 MB I0412 13:15:14.773665 7225 context_gpu.cu:321] GPU 0: 2376 MB I0412 13:15:14.773679 7225 context_gpu.cu:325] Total: 2376 MB I0412 13:15:14.782863 7225 context_gpu.cu:321] GPU 0: 2511 MB I0412 13:15:14.782871 7225 context_gpu.cu:325] Total: 2511 MB I0412 13:15:14.830847 7225 context_gpu.cu:321] GPU 0: 2641 MB I0412 13:15:14.830859 7225 context_gpu.cu:325] Total: 2641 MB I0412 13:15:14.886987 7225 context_gpu.cu:321] GPU 0: 2771 MB I0412 13:15:14.887001 7225 context_gpu.cu:325] Total: 2771 MB I0412 13:15:14.889497 7225 context_gpu.cu:321] GPU 0: 2934 MB I0412 13:15:14.889508 7225 context_gpu.cu:325] Total: 2934 MB I0412 13:15:14.892792 7225 context_gpu.cu:321] GPU 0: 3359 MB I0412 13:15:14.892802 7225 context_gpu.cu:325] Total: 3359 MB I0412 13:15:14.897338 7225 context_gpu.cu:321] GPU 0: 3699 MB I0412 13:15:14.897353 7225 context_gpu.cu:325] Total: 3699 MB I0412 13:15:14.901777 7225 context_gpu.cu:321] GPU 0: 4100 MB I0412 13:15:14.901788 7225 context_gpu.cu:325] Total: 4100 MB terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at spatial_narrow_as_op.cu:87] A.dim32(3) >= B.dim32(3). 42 vs 64. Input 0 width must be >= input 1 width. Error from operator: input: "gpu_0/rpn_labels_int32_wide_fpn5" input: "gpu_0/rpn_cls_logits_fpn5" output: "gpu_0/rpn_labels_int32_fpn5" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 } Aborted at 1523528114 (unix time) try "date -d @1523528114" if you are using GNU date PC: @ 0x7fec25400428 gsignal SIGABRT (@0x3e800001bf0) received by PID 7152 (TID 0x7feaadbfe700) from PID 7152; stack trace: @ 0x7fec254004b0 (unknown) I0412 13:15:14.930836 7225 context_gpu.cu:321] GPU 0: 4501 MB I0412 13:15:14.930851 7225 context_gpu.cu:325] Total: 4501 MB @ 0x7fec25400428 gsignal @ 0x7fec2540202a abort @ 0x7fec1ef2184d __gnu_cxx::__verbose_terminate_handler() terminate called recursively @ 0x7fec1ef1f6b6 (unknown) terminate called recursively @ 0x7fec1ef1f701 std::terminate() @ 0x7fec1ef4ad38 (unknown) @ 0x7fec2579c6ba start_thread @ 0x7fec254d241d clone @ 0x0 (unknown) Aborted (core dumped)

The command that i ran



### System information

* Operating system: Ubuntu 16.04
* CUDA version: Cuda compilation tools, release 8.0, V8.0.61
* cuDNN version: 6.0.21
* NVIDIA driver version: 390.48
* GPU models (for all devices if they are not all the same):   GeForce GTX 1080i
* `PYTHONPATH` environment variable: ?
* `python --version` output: 2.7

Well i'm not very sure where the problem is as long as "rpn_labels_int32_wide_fpn" is an external input: 

This is from the net.pbtxt generated:

external_input: "gpu_0/rpn_labels_int32_wide_fpn2"
external_input: "gpu_0/rpn_bbox_targets_wide_fpn2"
external_input: "gpu_0/rpn_bbox_inside_weights_wide_fpn2"
external_input: "gpu_0/rpn_bbox_outside_weights_wide_fpn2"
external_input: "gpu_0/rpn_labels_int32_wide_fpn3"
external_input: "gpu_0/rpn_bbox_targets_wide_fpn3"
external_input: "gpu_0/rpn_bbox_inside_weights_wide_fpn3"
external_input: "gpu_0/rpn_bbox_outside_weights_wide_fpn3"
external_input: "gpu_0/rpn_labels_int32_wide_fpn4"
external_input: "gpu_0/rpn_bbox_targets_wide_fpn4"
external_input: "gpu_0/rpn_bbox_inside_weights_wide_fpn4"
external_input: "gpu_0/rpn_bbox_outside_weights_wide_fpn4"
external_input: "gpu_0/rpn_labels_int32_wide_fpn5"
external_input: "gpu_0/rpn_bbox_targets_wide_fpn5"
external_input: "gpu_0/rpn_bbox_inside_weights_wide_fpn5"
external_input: "gpu_0/rpn_bbox_outside_weights_wide_fpn5"
rbgirshick commented 6 years ago

Random guess: check that all of the convolutions use 'same' padding so that the input and output feature maps have the same spatial size (when stride is 1).

liuliu66 commented 6 years ago

@stanciuflorina Hi, have you solved this problem? I met the similar issue when use FPN on Inception and Densenet. When I reduce the number of FPN levels the error would not appear but the model performance is very pool.

mamunir commented 5 years ago

Does anybody help in this? I have been in this problem since 3 days. Unable to locate the cause

Ezereal commented 5 years ago

et the similar issue when use FPN

Hi, how do you change the backbone? Add related code and find the backbone models? And where do you find the models? Thank you for answer my question!

DeepAndy commented 4 years ago

anybody got a good solution for this?