Open liuliu66 opened 6 years ago
With detectron, we build the computation graph first, then run the net.
The error you encountered could be runtime error. The connection of network is right, but there are some minor mistakes in your blobs' shape.
In FPN.py
add_fpn()
, there are some lines of code:
if not cfg.FPN.EXTRA_CONV_LEVELS and max_level == HIGHEST_BACKBONE_LVL + 1:
# Original FPN P6 level implementation from our CVPR'17 FPN paper
P6_blob_in = blobs_fpn[0]
P6_name = P6_blob_in + '_subsampled_2x'
# Use max pooling to simulate stride 2 subsampling
P6_blob = model.MaxPool(P6_blob_in, P6_name, kernel=1, pad=0, stride=2)
blobs_fpn.insert(0, P6_blob)
spatial_scales.insert(0, spatial_scales[0] * 0.5)
They perform MaxPool
on your conv5_4
, outputing conv5_4_subsampled_2x
. And RPN use it as fpn6
. It may lead to runtime spatial size mismatch of operator SpatialNarrowAs
.
How about build 2 FPN levels on VGG16 and run your net, analyse the net structure, especially the shape of blobs rpn_bbox_pred_fpn6
output by detectron (every time your successfully run a net, the net structure will be display in terminal)?
@suica Hi, Thanks for your help! I have carefully analyzed the net structure log when I use the 1 or 2 FPN levels without extra p6. I found that the problem is that the shape of blob 'rpn_bbox_targets_wide_fpn2' doesn't match that of blob 'rpn_bbox_pred_fpn2' which is like this:
rpn_bbox_targets_wide_fpn2 : (2, 12, 456, 456) => rpn_bbox_targets_fpn2 : (2, 12, 172, 228) rpn_bbox_pred_fpn2 : (2, 12, 172, 228) => rpn_bbox_targets_fpn2 : (2, 12, 172, 228)
In my view, the blob rpn_bbox_targets_wide_fpn2 is got from raw data and the blob rpn_bbox_pred_fpn2 is computed by my net that the shape is correct. So do you know where is the code used for computing rpn_bbox_targets_wide_fpn2 this blob? I want to see the codes to check the problem. Thank you!
Hi @liuliu66, I did a quick search and found out that rpn_bbox_targets_wide_fpn*
is returned by _get_rpn_blobs()
in lib/roi_data/rpn.py
.
...
for foa in foas:
H = foa.field_size
W = foa.field_size
A = foa.num_cell_anchors
end_idx = start_idx + H * W * A
_labels = labels[start_idx:end_idx]
_bbox_targets = bbox_targets[start_idx:end_idx, :]
_bbox_inside_weights = bbox_inside_weights[start_idx:end_idx, :]
_bbox_outside_weights = bbox_outside_weights[start_idx:end_idx, :]
start_idx = end_idx
# labels output with shape (1, A, height, width)
_labels = _labels.reshape((1, H, W, A)).transpose(0, 3, 1, 2)
# bbox_targets output with shape (1, 4 * A, height, width)
_bbox_targets = _bbox_targets.reshape(
(1, H, W, A * 4)).transpose(0, 3, 1, 2)
# bbox_inside_weights output with shape (1, 4 * A, height, width)
_bbox_inside_weights = _bbox_inside_weights.reshape(
(1, H, W, A * 4)).transpose(0, 3, 1, 2)
# bbox_outside_weights output with shape (1, 4 * A, height, width)
_bbox_outside_weights = _bbox_outside_weights.reshape(
(1, H, W, A * 4)).transpose(0, 3, 1, 2)
blobs_out.append(
dict(
rpn_labels_int32_wide=_labels,
rpn_bbox_targets_wide=_bbox_targets,
rpn_bbox_inside_weights_wide=_bbox_inside_weights,
rpn_bbox_outside_weights_wide=_bbox_outside_weights
)
)
return blobs_out[0] if len(blobs_out) == 1 else blobs_out
I recommend you to use IDE something which provides you global search of keyword to browse Detectron's code. It's very convenient. I use PyCharm, and it's free if you register with your university email.
@suica Hi, thanks for your help and advice! I am going to use PyCharm to browse the code~
@liuliu66 Hi, were you able to solve this problem? I am also trying to build an FPN onto another backbone
And: how did you define the convolutional layers in such a way that the dimensions match? I added some convolutional layers like this:
...
[rest of model definition here]
...
model.Relu('fire9-expand3x3', 'fire9-relu_expand3x3')
model.Concat(['fire9-relu_expand1x1', 'fire9-relu_expand3x3'], 'fire9-concat')
#output dimension of fire9-concat is 512
model.Conv('fire9-concat', 'conv1', 512, 256, pad=0, kernel=1)
model.Relu('conv1', 'conv1-relu')
model.Conv('relu-conv1', 'conv2', 256, 128, pad=0, kernel=1)
model.Relu('conv2', 'conv2-relu')
model.Conv('relu-conv2', 'conv3', 128, 64, pad=0, kernel=1)
model.Relu('conv3', 'conv3-relu')
then I defined the FPNLevelInfo:
fpn_level_info = FPN.FpnLevelInfo( blobs=('fire9-concat', 'conv1-relu', 'conv2-relu', 'conv3-relu'), dims=(512, 256, 128, 64), spatial_scales=(1. / 16., 1. / 8., 1. / 4., 1. / 2.) )
the error I get is:
File "tools/train_net_device_1.py", line 133, in <module>
main()
File "tools/train_net_device_1.py", line 115, in main
checkpoints = detectron.utils.train.train_model()
File "/home/thiemi/Detectron/detectron/detectron/utils/train.py", line 53, in train_model
model, weights_file, start_iter, checkpoints, output_dir = create_model()
File "/home/thiemi/Detectron/detectron/detectron/utils/train.py", line 134, in create_model
model = model_builder.create(cfg.MODEL.TYPE, train=True)
File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 124, in create
return get_func(model_type_func)(model)
File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
optim.build_data_parallel_model(model, _single_gpu_build_func)
File "/home/thiemi/Detectron/detectron/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model
all_loss_gradients = _build_forward_graph(model, single_gpu_build_func)
File "/home/thiemi/Detectron/detectron/detectron/modeling/optimizer.py", line 63, in _build_forward_graph
all_loss_gradients.update(single_gpu_build_func(model))
File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
File "/home/thiemi/Detectron/detectron/detectron/modeling/squeezenet_FPN.py", line 136, in add_squeezenet_conv5_body
model.Conv('fire9-expand3x3', 'conv1', 64, 256, pad=0, kernel=1)
File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 97, in Conv
**kwargs
File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/brew.py", line 107, in scope_wrapper
return func(*args, **new_kwargs)
File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
group, transform_inputs, **kwargs)
File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 88, in _ConvBase
tags=ParameterTags.WEIGHT
File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/model_helper.py", line 208, in create_param
assert self._parameters_info[param_name].shape == shape
AssertionError
I get this error at the beginning of training so the model cannot even be created. I am absolutely not sure whether the kernel and padding parameters of the convolutional layers are right at all. Do you happen to have any hints for me?
你好,我在用vgg16的时候也出现了类似的错误,能请教一下你是怎么解决的吗 RuntimeError: [enforce fail at conv_opcudnn.cc:555] filter.dim32(1) == C / group. 215 vs 512Error from operator: input: "gpu_0/conv4_3" input: "gpu_0/fpn_inner_conv4_3_lateral_w" input: "gpu_0/fpn_inner_conv4_3_lateral_b" output: "gpu_0/fpn_inner_conv4_3_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
can you tell me how to download the other pretrainted models?eg inception and other
@shenghsiaowong I stopped working on this some time ago since I finished my bachelor's thesis. Unfortunately I cannot really tell what the error is. It looks like a dimension mismatch. This can happen if the parameters of the configuration file are not set properly, especially if you exchange the base net.
@liuliu66 can you please explain the solution in detail? much appreciated
@liuliu66 have you resolved this problem? I encounted a similar problem when I trained the Faster RCNN using VGG16 and FPN. Can you tell me how did you solved it? many thx~~
Hi,
Thanks for Detectron framework for solving my object detection dataset. But I met a issue when I tried to build FPN structure on other CNN backbones like VGG16. I built the VGG model and added the fpn level information in FPN.py file:
def fpn_level_info_VGG19_conv5(): return FpnLevelInfo( blobs=('conv5_4', 'conv4_3', 'conv3_3', 'conv2_2'), dims=(512, 512, 256, 128), spatial_scales=(1. / 16., 1. / 8., 1. / 4., 1. / 2.) )
Then my problem is that:
terminate called after throwing an instance of 'caffe2::EnforceNotMet' Aborted at 1529391102 (unix time) try "date -d @1529391102" if you are using GNU date what(): [enforce fail at spatial_narrow_as_op.cu:85] A.dim32(2) >= B.dim32(2). 24 vs 36. Input 0 height must be >= input 1 height. Error from operator: input: "gpu_0/rpn_bbox_targets_wide_fpn6" input: "gpu_0/rpn_bbox_pred_fpn6" output: "gpu_0/rpn_bbox_targets_fpn6" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 }
I think the model is right because it could be created. And if I only build 2 or 1 (that means no FPN) FPN levels on it, it could run well. So does anybody could help me to solve it?