facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.24k stars 5.45k forks source link

Check failed: output->dims() == Input(i).dims() (detectron model -> caffe2 model -> onnx model) #752

Open lilichu opened 5 years ago

lilichu commented 5 years ago

hi! I run ./tools/convert_pkl_to_pb.py to convert detectron model to caffe2 model successfully. Then I run the code to convert caffe2 model to onnx model:

import onnx
import caffe2.python.onnx.frontend
from caffe2.proto import caffe2_pb2

# We need to provide type and shape of the model inputs,
# see above Note section for explanation
data_type = onnx.TensorProto.FLOAT
data_shape = (1, 3, 720, 1280)
value_info = {
    'data': (data_type, data_shape)
}

predict_net = caffe2_pb2.NetDef()
with open('model.pb', 'rb') as f:
# with open('predict_net.pb', 'rb') as f:
    predict_net.ParseFromString(f.read())

init_net = caffe2_pb2.NetDef()
with open('model_init.pb', 'rb') as f:
# with open('init_net.pb', 'rb') as f:
    init_net.ParseFromString(f.read())

onnx_model = caffe2.python.onnx.frontend.caffe2_net_to_onnx_model(
    predict_net,
    init_net,
    value_info,
)

onnx.checker.check_model(onnx_model)

I encounter the issue: RuntimeError: [enforce fail at operator.cc:213] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled" input: "rois_idx_restore_int32" output: "roi_feat" name: "" type: "BatchPermutation" device_option { } engine:

Is it because that BatchPermutation belongs to detectron? I try to add these code:

from caffe2.python import dyndep
detectron_ops_lib = '/home/user/pytorch/build/lib/libcaffe2_detectron_ops_gpu.so'
dyndep.InitOpsLibrary(detectron_ops_lib)

The issue disappear. But I encounter the new issue:

RuntimeError: [enforce fail at utility_ops.h:275] . Check failed: output->dims() == Input(i).dims().Description: Input #1, input dimension:[1, 256, 46, 80] should match output dimension: [1, 256, 45, 80]
Error from operator: 
input: "fpn_inner_res4_5_sum_lateral" input: "fpn_inner_res4_5_sum_topdown" output: "fpn_inner_res4_5_sum" name: "" type: "Sum" device_option { } engine: ""frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x76 (0x7ff40d388b86 in /home/user/.local/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)

Why the input dimension and output dimention are different? Could you please help me? Thanks!

jeremyolsen commented 5 years ago

I also get this same error doing the same steps as listed above. However I'm able to get past this input/ouput check error by changing the data_shape to (1, 3, 800, 800). I don't think this is the correct inputs as I'm using the R-50-FPN tutorial ('getting started') model. I think it should be (1, 3, 833, 833)? I'm not really sure though.

So after 'fixing' the input/output error, I encounter another issue right after that one: RuntimeError: [enforce fail at generate_proposals_op.cc:243] im_info_tensor.sizes() == (vector<int64_t>{num_images, 3}). [0] vs 1 3 Error from operator: input: "rpn_cls_probs_fpn2_cpu" input: "rpn_bbox_pred_fpn2_cpu" input: "im_info" input: "anchor2_cpu" output: "rpn_rois_fpn2" output: "rpn_roi_probs_fpn2" name: "" type: "GenerateProposals" arg {

Isn't this an issue of GenerateProposals not being implemented for CUDA/GPU I've seen elsewhere reported?

lilichu commented 5 years ago

hi! @jeremyolsen I change the input data to (1, 3, 800, 800) as you say, it is ok. But I also encounter the same issue:

RuntimeError: [enforce fail at generate_proposals_op.cc:243] im_info_tensor.sizes() == (vector<int64_t>{num_images, 3}). [0] vs 1 3
Error from operator: 
input: "rpn_cls_probs_fpn2" input: "rpn_bbox_pred_fpn2" input: "im_info" input: "anchor2" output: "rpn_rois_fpn2" output: "rpn_roi_probs_fpn2" name: "" type: "GenerateProposals" arg { name: "nms_thresh" f: 0.7 } arg { name: "min_size" f: 0 } arg { name: "spatial_scale" f: 0.25 } arg { name: "correct_transform_coords" i: 1 } arg { name: "post_nms_topN" i: 1000 } arg { name: "pre_nms_topN" i: 1000 } device_option { } engine: ""
jeremyolsen commented 5 years ago

@lilichu - This new issue looks to be a duplicate of pytorch/pytorch#10849. The required ops are not in the ONNX spec yet. Looks like they have been in-progress for quite a while. Check onnx/onnx#1010 for more details.

pkuxwguan commented 5 years ago

@lilichu @jeremyolsen I met the same problem as you described. Have you figured out it?

lilichu commented 5 years ago

@pkuxwguan which issue do you encounter? I change the input data to (1, 3, 800, 800) and the first issue is ok. to the last issue, as @jeremyolsen says, The required ops are not in the ONNX spec yet.check https://github.com/onnx/onnx/pull/1010 for more details.

pkuxwguan commented 5 years ago

@lilichu hi, I met the same two issues when using the .pb file for inference , not in converting the pb model to onnx model. I have fixed them independently by resizing the image to be multiple of 32 and adding the 'im_info' information to blob. Thank you for your advice.

mosjoker commented 5 years ago

Someone figured out it? I met the same problem.

gangulypritha commented 5 years ago

@pkuxwguan Hi, What do you mean by adding "im_info" information to blob? The image size which I used was (1,3,800,800) and caffe2.python.onnx.frontend.caffe2_net_to_onnx_model(..) function returns a RuntimeError :

" RuntimeError: [enforce fail at generate_proposals_op.cc:243] im_info_tensor.sizes() == (vector{num_images, 3}). [0] vs 1 3 Error from operator: input: "rpn_cls_probs_cpu" input: "rpn_bbox_pred_cpu" input: "im_info" input: "anchor" output: "rpn_rois" output: "rpn_roi_probs" name: "" type: "GenerateProposals" arg { name: "nms_thresh" f: 0.7 } arg { name: "min_size" f: 0 } arg { name: "spatial_scale" f: 0.0625 } arg { name: "correct_transform_coords" i: 1 } arg { name: "post_nms_topN" i: 1000 } arg { name: "pre_nms_topN" i: 6000 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x68 (0x7f786ea37d78 in /usr/local/lib/libc10.so)

civilpat commented 5 years ago

@gangulypritha @jeremyolsen @lilichu @pkuxwguan Hi guys, I have the same problem. Have you found any way out? After changing the size of data into 'multiple of 32', either (1,3,800,800) or (1,3,704,832). (Because my original image size is (1,3,720,1080) and the limit is 833 in detectron.) The same error kept showing:

RuntimeError: [enforce fail at generate_proposals_op.cc:281] im_info_tensor.sizes() == (vector<int64_t>{num_images, 3}). [0] vs 1 3 Error from operator: input: "rpn_cls_probs_fpn2_cpu" input: "rpn_bbox_pred_fpn2_cpu" input: "im_info" input: "anchor2_cpu" output: "rpn_rois_fpn2" output: "rpn_roi_probs_fpn2" name: "" type: "GenerateProposals" arg { name: "nms_thresh" f: 0.7 } arg { name: "min_size" f: 0 } arg { name: "spatial_scale" f: 0.25 } arg { name: "correct_transform_coords" i: 1 } arg { name: "post_nms_topN" i: 1000 } arg { name: "pre_nms_topN" i: 1000 }frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f2743d18441 in /usr/local/lib/python2.7/dist-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f2743d18259 in /usr/local/lib/python2.7/dist-packages/caffe2/python/../../torch/lib/libc10.so) frame #2: <unknown function> + 0x1cbd5de (0x7f277a9f75de in /usr/local/lib/python2.7/dist-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #3: <unknown function> + 0x1829345 (0x7f277a563345 in /usr/local/lib/python2.7/dist-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #4: caffe2::SimpleNet::Run() + 0x161 (0x7f277a637101 in /usr/local/lib/python2.7/dist-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #5: caffe2::Workspace::RunNetOnce(caffe2::NetDef const&) + 0x2b (0x7f277a66da0b in /usr/local/lib/python2.7/dist-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: <unknown function> + 0x5859f (0x7f278336d59f in /usr/local/lib/python2.7/dist-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #7: <unknown function> + 0x9321e (0x7f27833a821e in /usr/local/lib/python2.7/dist-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #8: PyEval_EvalFrameEx + 0x6f3e (0x4c2e1e in python2) frame #9: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #10: PyEval_EvalFrameEx + 0x6076 (0x4c1f56 in python2) frame #11: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #12: python2() [0x4d57a3] frame #13: PyObject_Call + 0x3e (0x4a587e in python2) frame #14: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2) frame #15: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #16: PyEval_EvalFrameEx + 0x6076 (0x4c1f56 in python2) frame #17: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #18: PyEval_EvalFrameEx + 0x58e6 (0x4c17c6 in python2) frame #19: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #20: python2() [0x4d57a3] frame #21: PyObject_Call + 0x3e (0x4a587e in python2) frame #22: PyEval_EvalFrameEx + 0x263e (0x4be51e in python2) frame #23: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #24: PyEval_EvalFrameEx + 0x6076 (0x4c1f56 in python2) frame #25: PyEval_EvalCodeEx + 0x306 (0x4b9b66 in python2) frame #26: python2() [0x4eb69f] frame #27: PyRun_FileExFlags + 0x82 (0x4e58f2 in python2) frame #28: PyRun_SimpleFileExFlags + 0x186 (0x4e41a6 in python2) frame #29: Py_Main + 0x54e (0x4938ce in python2) frame #30: __libc_start_main + 0xf0 (0x7f279b5ca830 in /lib/x86_64-linux-gnu/libc.so.6) frame #31: _start + 0x29 (0x493299 in python2)

satyajithj commented 5 years ago

@gangulypritha @civilpat The GenerateProposals op does not seem to be currently available in the ONNX opset.

satyajithj commented 5 years ago

@gangulypritha @civilpat Were you able to run predictions using the caffe2 model (obtained from the Detectron model) on your computer with --device CPU?

satyajithj commented 5 years ago

@lilichu

I change the input data to (1, 3, 800, 800) as you say, it is ok. But I also encounter the same issue:

RuntimeError: [enforce fail at generate_proposals_op.cc:243] im_info_tensor.sizes() == (vector<int64_t>{num_images, 3}). [0] vs 1 3
Error from operator: 
input: "rpn_cls_probs_fpn2" input: "rpn_bbox_pred_fpn2" input: "im_info" input: "anchor2" output: "rpn_rois_fpn2" output: "rpn_roi_probs_fpn2" name: "" type: "GenerateProposals" arg { name: "nms_thresh" f: 0.7 } arg { name: "min_size" f: 0 } arg { name: "spatial_scale" f: 0.25 } arg { name: "correct_transform_coords" i: 1 } arg { name: "post_nms_topN" i: 1000 } arg { name: "pre_nms_topN" i: 1000 } device_option { } engine: ""

I get that same error with ONNX and also when I try to run the caffe2 model on my Intel CPU.