FPN built on other CNN backbones

liuliu66 commented 6 years ago

Hi,

Thanks for Detectron framework for solving my object detection dataset. But I met a issue when I tried to build FPN structure on other CNN backbones like VGG16. I built the VGG model and added the fpn level information in FPN.py file:

def fpn_level_info_VGG19_conv5(): return FpnLevelInfo( blobs=('conv5_4', 'conv4_3', 'conv3_3', 'conv2_2'), dims=(512, 512, 256, 128), spatial_scales=(1. / 16., 1. / 8., 1. / 4., 1. / 2.) )

Then my problem is that:

terminate called after throwing an instance of 'caffe2::EnforceNotMet' Aborted at 1529391102 (unix time) try "date -d @1529391102" if you are using GNU date what(): [enforce fail at spatial_narrow_as_op.cu:85] A.dim32(2) >= B.dim32(2). 24 vs 36. Input 0 height must be >= input 1 height. Error from operator: input: "gpu_0/rpn_bbox_targets_wide_fpn6" input: "gpu_0/rpn_bbox_pred_fpn6" output: "gpu_0/rpn_bbox_targets_fpn6" name: "" type: "SpatialNarrowAs" device_option { device_type: 1 cuda_gpu_id: 0 }

I think the model is right because it could be created. And if I only build 2 or 1 (that means no FPN) FPN levels on it, it could run well. So does anybody could help me to solve it?

suica commented 6 years ago

With detectron, we build the computation graph first, then run the net.

The error you encountered could be runtime error. The connection of network is right, but there are some minor mistakes in your blobs' shape.

In FPN.py add_fpn(), there are some lines of code:

if not cfg.FPN.EXTRA_CONV_LEVELS and max_level == HIGHEST_BACKBONE_LVL + 1:
        # Original FPN P6 level implementation from our CVPR'17 FPN paper
        P6_blob_in = blobs_fpn[0]
        P6_name = P6_blob_in + '_subsampled_2x'
        # Use max pooling to simulate stride 2 subsampling
        P6_blob = model.MaxPool(P6_blob_in, P6_name, kernel=1, pad=0, stride=2)
        blobs_fpn.insert(0, P6_blob)
        spatial_scales.insert(0, spatial_scales[0] * 0.5)

They perform MaxPoolon your conv5_4, outputing conv5_4_subsampled_2x. And RPN use it as fpn6. It may lead to runtime spatial size mismatch of operator SpatialNarrowAs .

How about build 2 FPN levels on VGG16 and run your net, analyse the net structure, especially the shape of blobs rpn_bbox_pred_fpn6 output by detectron (every time your successfully run a net, the net structure will be display in terminal)?

liuliu66 commented 6 years ago

@suica Hi, Thanks for your help! I have carefully analyzed the net structure log when I use the 1 or 2 FPN levels without extra p6. I found that the problem is that the shape of blob 'rpn_bbox_targets_wide_fpn2' doesn't match that of blob 'rpn_bbox_pred_fpn2' which is like this:

rpn_bbox_targets_wide_fpn2 : (2, 12, 456, 456) => rpn_bbox_targets_fpn2 : (2, 12, 172, 228) rpn_bbox_pred_fpn2 : (2, 12, 172, 228) => rpn_bbox_targets_fpn2 : (2, 12, 172, 228)

In my view, the blob rpn_bbox_targets_wide_fpn2 is got from raw data and the blob rpn_bbox_pred_fpn2 is computed by my net that the shape is correct. So do you know where is the code used for computing rpn_bbox_targets_wide_fpn2 this blob? I want to see the codes to check the problem. Thank you!

suica commented 6 years ago

Hi @liuliu66, I did a quick search and found out that rpn_bbox_targets_wide_fpn* is returned by _get_rpn_blobs() in lib/roi_data/rpn.py.

    ...
    for foa in foas:
        H = foa.field_size
        W = foa.field_size
        A = foa.num_cell_anchors
        end_idx = start_idx + H * W * A
        _labels = labels[start_idx:end_idx]
        _bbox_targets = bbox_targets[start_idx:end_idx, :]
        _bbox_inside_weights = bbox_inside_weights[start_idx:end_idx, :]
        _bbox_outside_weights = bbox_outside_weights[start_idx:end_idx, :]
        start_idx = end_idx

        # labels output with shape (1, A, height, width)
        _labels = _labels.reshape((1, H, W, A)).transpose(0, 3, 1, 2)
        # bbox_targets output with shape (1, 4 * A, height, width)
        _bbox_targets = _bbox_targets.reshape(
            (1, H, W, A * 4)).transpose(0, 3, 1, 2)
        # bbox_inside_weights output with shape (1, 4 * A, height, width)
        _bbox_inside_weights = _bbox_inside_weights.reshape(
            (1, H, W, A * 4)).transpose(0, 3, 1, 2)
        # bbox_outside_weights output with shape (1, 4 * A, height, width)
        _bbox_outside_weights = _bbox_outside_weights.reshape(
            (1, H, W, A * 4)).transpose(0, 3, 1, 2)
        blobs_out.append(
            dict(
                rpn_labels_int32_wide=_labels,
                rpn_bbox_targets_wide=_bbox_targets,
                rpn_bbox_inside_weights_wide=_bbox_inside_weights,
                rpn_bbox_outside_weights_wide=_bbox_outside_weights
            )
        )
    return blobs_out[0] if len(blobs_out) == 1 else blobs_out

I recommend you to use IDE something which provides you global search of keyword to browse Detectron's code. It's very convenient. I use PyCharm, and it's free if you register with your university email.

liuliu66 commented 6 years ago

@suica Hi, thanks for your help and advice! I am going to use PyCharm to browse the code~

johannathiemich commented 6 years ago

@liuliu66 Hi, were you able to solve this problem? I am also trying to build an FPN onto another backbone

And: how did you define the convolutional layers in such a way that the dimensions match? I added some convolutional layers like this:

...
[rest of model definition here]
...
   model.Relu('fire9-expand3x3', 'fire9-relu_expand3x3')
 model.Concat(['fire9-relu_expand1x1', 'fire9-relu_expand3x3'], 'fire9-concat')
#output dimension of fire9-concat is 512
    model.Conv('fire9-concat', 'conv1', 512, 256, pad=0, kernel=1)
    model.Relu('conv1', 'conv1-relu')

    model.Conv('relu-conv1', 'conv2', 256, 128, pad=0, kernel=1)
    model.Relu('conv2', 'conv2-relu')

    model.Conv('relu-conv2', 'conv3', 128, 64, pad=0, kernel=1)
    model.Relu('conv3', 'conv3-relu')

then I defined the FPNLevelInfo:

fpn_level_info = FPN.FpnLevelInfo( blobs=('fire9-concat', 'conv1-relu', 'conv2-relu', 'conv3-relu'), dims=(512, 256, 128, 64), spatial_scales=(1. / 16., 1. / 8., 1. / 4., 1. / 2.) )

the error I get is:

  File "tools/train_net_device_1.py", line 133, in <module>
    main()
  File "tools/train_net_device_1.py", line 115, in main
    checkpoints = detectron.utils.train.train_model()
  File "/home/thiemi/Detectron/detectron/detectron/utils/train.py", line 53, in train_model
    model, weights_file, start_iter, checkpoints, output_dir = create_model()
  File "/home/thiemi/Detectron/detectron/detectron/utils/train.py", line 134, in create_model
    model = model_builder.create(cfg.MODEL.TYPE, train=True)
  File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 124, in create
    return get_func(model_type_func)(model)
  File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
    freeze_conv_body=cfg.TRAIN.FREEZE_CONV_BODY
  File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
    optim.build_data_parallel_model(model, _single_gpu_build_func)
  File "/home/thiemi/Detectron/detectron/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model
    all_loss_gradients = _build_forward_graph(model, single_gpu_build_func)
  File "/home/thiemi/Detectron/detectron/detectron/modeling/optimizer.py", line 63, in _build_forward_graph
    all_loss_gradients.update(single_gpu_build_func(model))
  File "/home/thiemi/Detectron/detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
    blob_conv, dim_conv, spatial_scale_conv = add_conv_body_func(model)
  File "/home/thiemi/Detectron/detectron/detectron/modeling/squeezenet_FPN.py", line 136, in add_squeezenet_conv5_body
    model.Conv('fire9-expand3x3', 'conv1', 64, 256, pad=0, kernel=1)
  File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 97, in Conv
    **kwargs
  File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/brew.py", line 107, in scope_wrapper
    return func(*args, **new_kwargs)
  File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
    group, transform_inputs, **kwargs)
  File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/helpers/conv.py", line 88, in _ConvBase
    tags=ParameterTags.WEIGHT
  File "/home/thiemi/anaconda2/lib/python2.7/site-packages/caffe2/python/model_helper.py", line 208, in create_param
    assert self._parameters_info[param_name].shape == shape
AssertionError

I get this error at the beginning of training so the model cannot even be created. I am absolutely not sure whether the kernel and padding parameters of the convolutional layers are right at all. Do you happen to have any hints for me?

shenghsiaowong commented 6 years ago

你好，我在用vgg16的时候也出现了类似的错误，能请教一下你是怎么解决的吗 RuntimeError: [enforce fail at conv_opcudnn.cc:555] filter.dim32(1) == C / group. 215 vs 512Error from operator: input: "gpu_0/conv4_3" input: "gpu_0/fpn_inner_conv4_3_lateral_w" input: "gpu_0/fpn_inner_conv4_3_lateral_b" output: "gpu_0/fpn_inner_conv4_3_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"

shenghsiaowong commented 6 years ago

can you tell me how to download the other pretrainted models?eg inception and other

johannathiemich commented 6 years ago

@shenghsiaowong I stopped working on this some time ago since I finished my bachelor's thesis. Unfortunately I cannot really tell what the error is. It looks like a dimension mismatch. This can happen if the parameters of the configuration file are not set properly, especially if you exchange the base net.

mamunir commented 6 years ago

@liuliu66 can you please explain the solution in detail? much appreciated

lunaQi commented 5 years ago

@liuliu66 have you resolved this problem? I encounted a similar problem when I trained the Faster RCNN using VGG16 and FPN. Can you tell me how did you solved it? many thx~~

facebookresearch / Detectron

FPN built on other CNN backbones #503