Community effort to bring CPU and pure Caffe2 / C++ inference support

gadcam commented 6 years ago

It looks like many people are asking for CPU inference and it seems it needs much work to make it happen. What I offer is that we use this issue to publicly state what work is needed and so people eager to have this feature could easily help to implement it.

@daquexian, @orionr, @rbgirshick do you have time to share a list of features / ops needed to convert all the models with convert_pkl_to_pb.py ?

Feature/Operator	Where do we need it ?	State	Difficulty
CollectAndDistributeFpnRpnProposals	FPN	🕔 PR #372 submitted & Review needed	?
...	...	...	...

I would like to contribute to this effort but I do not know where to begin. If you are willing to implement a feature do not hesitate to tell it in this issue.

Ps: To avoid any confusion I am only a random user of the Detectron & my initiative was not solicited by the maintainers

daquexian commented 6 years ago

Based on https://github.com/facebookresearch/Detectron/pull/372, models containing FPN can be correctly converted to caffe2's .pb files. (I will rebase the PR on master soon) However only detection net will be converted even in Mask R-CNN and Keypoint R-CNN which has mask net or keypoint net.

gadcam commented 6 years ago

@daquexian I am really sorry but I think I failed to understand properly what you mean as I do not have a deep understanding how the Detectron repo works.

Do you mean that, when #372 will be merged, if we try to convert for example e2e_keypoint_rcnn_R-50-FPN_1x only the proposal part would be converted and so we could not use it on CPU ? If the answer to this question is yes, can you help us understand what steps we need to take to achieve a complete conversion ?

daquexian commented 6 years ago

@gadcam If we try to convert e2e_keypoint_rcnn_R-50-FPN_1x, we will only get bounding boxes but not keypoint. Because in here only model.net is used, but mask and keypoint are in model.mask_net and model.keypoint_net like it. The solution seems straightforward because there are only normal layers in these nets. But if you want to infer masks or keypoints after getting bounding boxes (in order to save inference time), it seems better to save these nets in different .pb files

HappyKerry commented 6 years ago

@daquexian would you like to write a detail guild on how to change pkl to pb? Thanks

daquexian commented 6 years ago

@HappyKerry Just fetch and checkout my branch

git remote add daquexian https://github.com/daquexian/Detectron
git fetch daquexian
git checkout daquexian/add-export-support-fpn

and run convert_pkl_to_pb.py with your configuration files and weights

gadcam commented 6 years ago

@daquexian I ran convert_pkl_to_pb.py (with your patch) successfully on e2e_keypoint_rcnn_R-50-FPN_s1x and on MSRA's original ResNet-50 model.

For e2e_keypoint_rcnn_R-50-FPN_s1x I have no warning. For MSRA's original ResNet-50 model I have the following output

Blob fpn_inner_res5_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res5_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res4_5_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res4_5_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res3_3_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res3_3_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res2_2_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res2_2_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res2_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res2_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob conv_rpn_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob conv_rpn_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_cls_logits_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_cls_logits_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_bbox_pred_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_bbox_pred_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fc6_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fc6_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fc7_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fc7_b with type <class 'str'> is not supported in generating init net, skipped.
Blob cls_score_w with type <class 'str'> is not supported in generating init net, skipped.
Blob cls_score_b with type <class 'str'> is not supported in generating init net, skipped.
Blob bbox_pred_w with type <class 'str'> is not supported in generating init net, skipped.
Blob bbox_pred_b with type <class 'str'> is not supported in generating init net, skipped.

If I try to convert model.keypoint_net from e2e_keypoint_rcnn_R-50-FPN_s1x I get

Blob fpn_res2_2_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn2 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn3 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn4 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn5 with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_idx_restore_int32 with type <class 'str'> is not supported in generating init net, skipped.

So I have a few questions

Why is there no Blob ____ is not supported for the keypoint model when we have some for the ResNet ?
What does it means to have some Blob ____ is not supported in the ResNet ? Should we implement these operators ? (I thought the ResNet would be converted without trouble)
Same question for the keypoint part of e2e_keypoint_rcnn_R-50-FPN_s1x

daquexian commented 6 years ago

@gadcam You should use the model in Model Zoo.

gadcam commented 6 years ago

@daquexian Then that is perfect : I did use the models in the Model Zoo. To be accurate what I call

MSRA's original ResNet-50 model is https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
e2e_keypoint_rcnn_R-50-FPN_s1x is https://s3-us-west-2.amazonaws.com/detectron/37697714/12_2017_baselines/e2e_keypoint_rcnn_R-50-FPN_s1x.yaml.08_44_03.qrQ0ph6M/output/train/keypoints_coco_2014_train%3Akeypoints_coco_2014_valminusminival/generalized_rcnn/model_final.pkl (so in section End-to-End Keypoint-Only Mask R-CNN Baselines)

Why did you suspect I tried to convert something else ? Because I have some Blob ____ is not supported when I should not ?

daquexian commented 6 years ago

@gadcam Yes. It is reasonable that Blob ____ is not supported appears when you use an ImageNet pretrained model, because fpn, rpn and some other layers are not in ImageNet pretrained models.

Could you please tell me what ops not supported output means?

gadcam commented 6 years ago

@daquexian

Could you please tell me what ops not supported output means?

I meant Blob ____ is not supported I am sorry for my inaccuracy. (I corrected it)

It is reasonable that Blob ____ is not supported appears when you use an ImageNet pretrained model, because fpn, rpn and some other layers are not in ImageNet pretrained models.

I am not sure I got this part : do you mean that when we see Blob ____ is not supported it means the Blob needs some code from the Detectron to be fully defined ?

So I think we are getting to the point of my issue : what should we implement to avoid it ? Or can you direct me where to dive to know what we need to implement ?

If we take an example (but we could say the same thing for keypoint_rois_idx_restore_int32)

Blob keypoint_rois_fpn2 with type <class 'str'> is not supported in generating init net, skipped.

The only mention I found of keypoint_rois_fpn in the code is here https://github.com/facebookresearch/Detectron/blob/b3c93df2cecca1139f73d005b9dfcd83ef55c16d/detectron/roi_data/fast_rcnn.py#L103 So I do not really know where to investigate to avoid this Blob ____ is not supported error.

As a side question should we implement something like https://github.com/facebookresearch/Detectron/blob/e5bb3a8ff0b9caf59c76037726f49465d6b9678b/detectron/ops/generate_proposal_labels.py#L30 in Caffe2/PyTorch repo and then add some conversion code here to get full CPU support ?

daquexian commented 6 years ago

@gadcam Blob ____ is not supported here just indicates that the blob doesn't have any value (I don't know why its type will be 'str' when it doesn't have any value, caffe2 is strange). There is no more layers needed to implement. You can add the name of these blobs into empty_blobs like

https://github.com/facebookresearch/Detectron/blob/b3c93df2cecca1139f73d005b9dfcd83ef55c16d/tools/convert_pkl_to_pb.py#L558

('data' and 'im_info' are the inputs of model.net, 'fpn_res2_2_sum', 'keypoint_rois_fpn2' and so on are the inputs of model.keypoint_net)

The converted model will crash when you try to verify it. Because its inputs are not legal. Maybe giving it some proper inputs ('fpn_res2_2_sum' and so on produced by bbox branch, and also "keypoint_rois_fpnX" below) will make it run.

https://github.com/facebookresearch/Detectron/blob/b3c93df2cecca1139f73d005b9dfcd83ef55c16d/detectron/core/test.py#L540-L566

gadcam commented 6 years ago

@daquexian Thank you for your hints, with a bit of work I was able to run e2e_keypoint_rcnn_R-50-FPN_s1x on CPU ! I will tidy up my code before sharing it. If I am able to write something clean enough I will do a PR to enable conversion of keypoints and mask-models with test to check the correctness of the conversion. (and so an example of how to run it) For the moment the main problem is that I could not pick programmatically the input blobs.

daquexian commented 6 years ago

@gadcam Great! Looking forward to your PR

dongmingsun commented 6 years ago

@gadcam Hi, are we able to convert the Mask R-CNN model from .pkl to .pb now?

gadcam commented 6 years ago

@dongmingsun With @daquexian's #372 + my (future) PR you will be able to convert the models from the Zoo from .pkl to two .pb files, one for the bbox and one for the mask or keypoints, and you would need to use some helper function to run them. What I achieved is to run it without the need of a GPU, not to have a pure Caffe2 model. I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

dongmingsun commented 6 years ago

@gadcam Thank you very much, so I still have to figure out how to feed a Detectron model to pure Caffe2 C++.

kundalee commented 6 years ago

@gadcam Hi, do you encounter this problem when you ran convert_pkl_to_pb.py in @daquexian .

config file: configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml
model: https://s3-us-west-2.amazonaws.com/detectron/35857345/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml.01_36_30.cUF7QR7I/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl

Cannot find operator schema for CollectAndDistributeFpnRpnProposals. Will skip schema checking. Traceback for operator 164 in network origin_model Traceback (most recent call last): File "tools/convert_pkl_to_pb.py", line 637, in <module> main() File "tools/convert_pkl_to_pb.py", line 631, in main verify_model(args, [net, init_net], args.test_img) File "tools/convert_pkl_to_pb.py", line 569, in verify_model _run_cfg_func, _run_pb_func, test_img, check_blobs) File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 367, in compare_model res2 = model2_func(test_image, check_blobs) File "tools/convert_pkl_to_pb.py", line 565, in _run_pb_func return run_model_pb(args, model_pb[0], model_pb[1], im, check_blobs) File "tools/convert_pkl_to_pb.py", line 505, in run_model_pb workspace.CreateNet(net) File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 163, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 189, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:191] op. Cannot create operator of type 'CollectAndDistributeFpnRpnProposals' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing.

daquexian commented 6 years ago

Hi @kundalee , it seems that your caffe2 version is not the latest. You might want to pull the latest code from https://github.com/pytorch/pytorch and recompile it.

On Wed, May 23, 2018, 12:10 PM Kunda notifications@github.com wrote:

@gadcam https://github.com/gadcam Hi, do you encounter this problem when you ran convert_pkl_to_pb.py in @daquexian https://github.com/daquexian .

Cannot find operator schema for CollectAndDistributeFpnRpnProposals. Will skip schema checking. Traceback for operator 164 in network origin_model Traceback (most recent call last): File "tools/convert_pkl_to_pb.py", line 637, in main() File "tools/convert_pkl_to_pb.py", line 631, in main verify_model(args, [net, init_net], args.test_img) File "tools/convert_pkl_to_pb.py", line 569, in verify_model _run_cfg_func, _run_pb_func, test_img, check_blobs) File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 367, in compare_model res2 = model2_func(test_image, check_blobs) File "tools/convert_pkl_to_pb.py", line 565, in _run_pb_func return run_model_pb(args, model_pb[0], model_pb[1], im, check_blobs) File "tools/convert_pkl_to_pb.py", line 505, in run_model_pb workspace.CreateNet(net) File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 163, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 189, in CallWithExceptionIntercept return func(*args, kwargs) RuntimeError: [enforce fail at operator.cc:191] op. Cannot create operator of type 'CollectAndDistributeFpnRpnProposals' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "rpn_rois_fpn2" input: "rpn_rois_fpn3" input: "rpn_rois_fpn4" input: "rpn_rois_fpn5" input: "rpn_rois_fpn6" input: "rpn_roi_probs_fpn2" input: "rpn_roi_probs_fpn3" input: "rpn_roi_probs_fpn4" input: "rpn_roi_probs_fpn5" input: "rpn_roi_probs_fpn6" output: "rpn_rois" output: "rois_fpn2" output: "rois_fpn3" output: "rois_fpn4" output: "rois_fpn5" output: "rois_idx_restore_int32" name: "" type: "CollectAndDistributeFpnRpnProposals" arg { name: "roi_max_level" i: 5 } arg { name: "rpn_post_nms_topN" i: 1000 } arg { name: "roi_canonical_scale" i: 224 } arg { name: "rpn_min_level" i: 2 } arg { name: "roi_canonical_level" i: 4 } arg { name: "roi_min_level" i: 2 } arg { name: "rpn_max_level" i: 6 } device_option { } engine: "" debug_info: " File "tools/convert_pkl_to_pb.py", line 637, in \n main()\n File "tools/convert_pkl_to_pb.py", line 607, in main\n convert_net(args, net.Proto(), blobs)\n File "tools/convert_pkl_to_pb.py", line 279, in convert_net\n convert_op_in_proto(net, convert_python)\n File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 113, in convert_op_in_proto\n convert_op_in_ops(proto.op, func_or_list)\n File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 102, in convert_op_in_ops\n new_ops = func(op)\n File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 76, in wrapper\n return f(op, params)\n File "tools/convert_pkl_to_pb.py", line 250, in convert_python\n rpn_post_nms_topN=cfg.TEST.RPN_POST_NMS_TOP_N,\n File "tools/convert_pkl_to_pb.py", line 158, in convert_collect_and_distribute\n rpn_post_nms_topN=rpn_post_nms_topN,\n"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/Detectron/issues/432#issuecomment-391214105, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEcn4FWwcu0YBA_OdHH_XwLx1Ba1Wogks5t1OErgaJpZM4T_faM .

HappyKerry commented 6 years ago

@gadcam @daquexian @dongmingsun I have changed pkl model to pb model,but how to use pb model in caffe2 C++? Thanks

daquexian commented 6 years ago

@HappyKerry you can search for caffe2 android demo or thiry-party tutorials

dongmingsun commented 6 years ago

@HappyKerry caffe2_cpp_tutorial might helps.

kundalee commented 6 years ago

Hi @daquexian Thank you very much. Because of you comments, the problem CollectAndDistributeFpnRpnProposals is solved. I have already converted .pkl to .pb successfully.

But when I try to load the pb files for testing on CPU, i get this problem below. Everything is fine until I call this functionworkspace.CreateNet(net).

workspace.CreateNet(net) File "/home/Rddd/data/projects/pytorch/build/caffe2/python/workspace.py", line 152, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/pytorch/build/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:185] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled" input: "rois_idx_restore_int32" output: "roi_feat" name: "" type: "BatchPermutation" device_option { } engine: ""

I have noticed that the function named verify_model after converting. It works well and no error occurred. Can someone tell me how to use pb model in caffe2 python? Thanks

daquexian commented 6 years ago

@kundalee BatchPermutation is in a caffe2 module. You need load the module in your code like https://github.com/facebookresearch/Detectron/blob/e5bb3a8ff0b9caf59c76037726f49465d6b9678b/detectron/utils/c2.py#L42 or this tutorial.

And I haven't find how to load module in c++. No one responds to my issue (It's so normal :D) So I compiled the detectron ops into caffe2 main library as a workaround.

HappyKerry commented 6 years ago

@daquexian I met the same "BatchPermutation"problem as @kundalee, So how to compile the detectron ops into caffe2 main library ?

daquexian commented 6 years ago

@HappyKerry Just copy detectron ops into the main caffe2 ops directory and recompile.

gadcam commented 6 years ago

@dongmingsun @daquexian

I still have to figure out how to feed a Detectron model to pure Caffe2 C++.

I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

Assuming that #372 & #449 are correct and merged. The main problem I see to do one of these two things is that we could put all the ops in the same net but we would need to write something like this just before inference:

def run_model_pb(args, models_pb, im, check_blobs):
    workspace.ResetWorkspace()
    net, init_net = models_pb
    workspace.RunNetOnce(init_net)
    mutils.create_input_blobs_for_net(net.Proto())
    workspace.CreateNet(net)

    input_blobs = _prepare_blobs(
        im,
        cfg.PIXEL_MEANS,
        cfg.TEST.SCALE, cfg.TEST.MAX_SIZE
    )
    boxes = ????
    if cfg.MODEL.MASK_ON:
        im_scale = input_blobs['im_info'][0][2]
        mask_rois = {'mask_rois': test._get_rois_blob(boxes, im_scale)}

        # Add multi-level rois for FPN
        if cfg.FPN.MULTILEVEL_ROIS:
            test._add_multilevel_rois_for_test(mask_rois, 'mask_rois')
        input_blobs.update(keypoints_rois)

    if cfg.MODEL.KEYPOINTS_ON:
        im_scale = input_blobs['im_info'][0][2]
        keypoints_rois = { 'keypoint_rois': test._get_rois_blob(boxes, im_scale)}

        # Add multi-level rois for FPN
        if cfg.FPN.MULTILEVEL_ROIS:
            test._add_multilevel_rois_for_test(input_blobs, 'keypoint_rois')
        input_blobs.update(keypoints_rois)

But we can not know boxes before inference... So do we have to run this in two steps if we want to keep the exact same architecture or am I missing something ? So @dongmingsun I think you have to do like in my PR : running first stage, "Add multi-level rois for FPN", running second stage & process the result.

As a sidenote why do we keep cfg.FPN.MULTILEVEL_ROIS if it is set to TRUE in all the CFG files ?

AwwNaiCha commented 6 years ago

@daquexian Hello. I am new to caffe2 and Detectron. I trained a model of detectron and want to test it in caffe2. Since the current branch of detectron does not support FPN conversion, so I search around and found your branch. I try to use your code to convert my pkl model to pb files. The model is based on Detectron tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml and trained with my own dataset.

I tried both gpu and cpu mode and got the following error. This one is got in CPU mode:

WARNING workspace.py: 185: Original python traceback for operator '121' in network 'detectron' in exception above (most recent call last):
Running pb model failed.
[enforce fail at upsample_nearest_op.h:39] . Not Implemented. Error from operator: 
input: "fpn_inner_res5_2_sum" output: "fpn_inner_res4_5_sum_topdown" name: "" type: "UpsampleNearest" arg { name: "scale" i: 2 } device_option { } engine: ""
Checking result_boxes -> result_boxes...
Traceback (most recent call last):
  File "/detectron/tools/convert_pkl_to_pb.py", line 637, in <module>
    main()
  File "/detectron/tools/convert_pkl_to_pb.py", line 631, in main
    verify_model(args, [net, init_net], args.test_img)
  File "/detectron/tools/convert_pkl_to_pb.py", line 569, in verify_model
    _run_cfg_func, _run_pb_func, test_img, check_blobs)
  File "/detectron/detectron/utils/model_convert_utils.py", line 379, in compare_model
    n1, n2, r1.shape, r2.shape)
AssertionError: Blob result_boxes and result_boxes shape mismatched: (9, 5) vs (0, 5)

Process finished with exit code 1

This one is got in GPU mode:

WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py:  59: Loading weights from: result50/model_iter19999.pkl
I0626 12:01:25.666318 29776 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000106857 secs
I0626 12:01:25.666505 29776 net_dag.cc:46] Number of parallel execution chains 63 Number of operators = 232
I0626 12:01:25.675417 29776 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 8.2534e-05 secs
I0626 12:01:25.675545 29776 net_dag.cc:46] Number of parallel execution chains 30 Number of operators = 188
Running the second model...
Checking result_boxes -> result_boxes...
Traceback (most recent call last):
  File "/detectron/tools/convert_pkl_to_pb.py", line 637, in <module>
    main()
  File "/detectron/tools/convert_pkl_to_pb.py", line 631, in main
    verify_model(args, [net, init_net], args.test_img)
  File "/detectron/tools/convert_pkl_to_pb.py", line 569, in verify_model
    _run_cfg_func, _run_pb_func, test_img, check_blobs)
  File "/detectron/detectron/utils/model_convert_utils.py", line 384, in compare_model
    n1, n2, np.amax(np.absolute(r1 - r2))))
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/nose_tools/utils.py", line 963, in assert_array_almost_equal
    precision=decimal)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 3 decimals
result_boxes and result_boxes not matched. Max diff: 4.39031982422
(mismatch 11.1111111111%)
 x: array([[7.503e+02, 3.873e+02, 8.095e+02, 4.501e+02, 9.987e-01],
       [1.055e+03, 3.291e+02, 1.113e+03, 3.970e+02, 9.965e-01],
       [8.385e+02, 3.726e+02, 8.958e+02, 4.344e+02, 9.940e-01],...
 y: array([[7.503e+02, 3.873e+02, 8.095e+02, 4.501e+02, 9.987e-01],
       [1.055e+03, 3.291e+02, 1.113e+03, 3.970e+02, 9.965e-01],
       [8.385e+02, 3.726e+02, 8.958e+02, 4.344e+02, 9.940e-01],...

Process finished with exit code 1

Do you have any idea how I can fix these? Thanks.

AwwNaiCha commented 6 years ago

@daquexian I change the input image dimension and everything works well now! Still thank you.

Kongsea commented 6 years ago

When I run python tools/convert_pkl_to_pb.py --cfg mm/noaug_2gpu_e2e_faster_rcnn_R-101-FPN.yaml --out_dir ttt --test_img 01.jpg --fuse_af 0 --device cpu, it raised the following error:

AssertionError: Blob result_boxes and result_boxes shape mismatched: (195, 5) vs (117, 5)

I trained the faster-rcnn model using the pretrained imagenet model, R-101.pkl. Could anybody give me some advice? Thanks.

pascschoSSL commented 6 years ago

@gadcam Thank you for the effort.

@dongmingsun With @daquexian's #372 + my (future) PR you will be able to convert the models from the Zoo from .pkl to two .pb files, one for the bbox and one for the mask or keypoints, and you would need to use some helper function to run them. What I achieved is to run it without the need of a GPU, not to have a pure Caffe2 model. I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

How to use the pb files in python? is there somewhere a tutorial/example? (i couldn't find something useful yet)

lilichu commented 6 years ago

Hi! @daquexian @HappyKerry @kundalee I use convert_pkl_to_pb.py to convert the detectron model to caffe2 model successfully. Then I want to use ONNX to convert the caffe2 model to ONNX model. I encounter the same issue as above:

WARNING:caffe2.python.workspace:Original python traceback for operator `170` in network `detectron` in exception above (most recent call last):
Traceback (most recent call last):
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/user/backup/lichu/onnx_convert/caffe2_onnx.py", line 24, in <module>
    value_info,
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/frontend.py", line 332, in caffe2_net_to_onnx_model
    model = make_model(cls.caffe2_net_to_onnx_graph(*args, **kwargs),
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/frontend.py", line 221, in caffe2_net_to_onnx_graph
    inputs)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/helper.py", line 62, in c2_native_run_net
    ws.RunNetOnce(predict_net)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/workspace.py", line 63, in f
    return getattr(workspace, attr)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 199, in RunNetOnce
    StringifyProto(net),
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept
    return func(*args, **kwargs)
RuntimeError: [enforce fail at operator.cc:185] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled_1" input: "rois_idx_restore_int32_1" output: "roi_feat_1" name: "" type: "BatchPermutation" device_option { } engine: ""

Does it mean that BatchPermutation can't be found in caffe2? what should I do? thanks!

tomas-wood commented 5 years ago

Hey I'm running

python detectron/tools/convert_pkl_to_pb.py --out_dir /app/out --cfg /app/detectron/configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml --device cpu  TEST.WEIGHTS model_final.pkl

and getting the following error.

Traceback (most recent call last):
  File "detectron/tools/convert_pkl_to_pb.py", line 654, in <module>
    main()
  File "detectron/tools/convert_pkl_to_pb.py", line 612, in main
    model, blobs = load_model(args)
  File "detectron/tools/convert_pkl_to_pb.py", line 420, in load_model
    model = test_engine.initialize_model_from_cfg(cfg.TEST.WEIGHTS)
  File "/app/detectron/detectron/core/test_engine.py", line 330, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/app/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/app/pytorch/build/caffe2/python/workspace.py", line 317, in FeedBlob
    return _Workspace_feed_blob(ws, name, arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 654, in _Workspace_feed_blob
    return ws.create_blob(name).feed(arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 676, in _Blob_feed
    return blob._feed(arg, device_option)
RuntimeError: [enforce fail at pybind_state.cc:348] feeder. Unknown device type encountered in FeedBlob.

I've built caffe2 with CPU-only support. Is this going to be a deal breaker? Should I fire up the GPU version and convert to PB with that? Looks like one needs GPU support to convert from pkl to pb. Is this assumption I'm making correct?

wytcsuch commented 5 years ago

Hi,I have converted .pkl model to .pb model under ubuntu16.04, and I want to use .pb in c++ windows.Do I need to install caffe2 under windows according to the tutorial https://caffe2.ai/docs/get-start.html? Platform=windows&configuration=compile? @HappyKerry @daquexian @dongmingsun

satyajithj commented 5 years ago

@lilichu

Does it mean that BatchPermutation can't be found in caffe2? what should I do? thanks!

Refer to this comment.

satyajithj commented 5 years ago

Is this finished?

gadcam commented 5 years ago

Hello @fuzzyBatman,

To be honest I do not know what is the current state of the Detectron. I closed this issue because I felt it was not useful any more as it did not get enough attention in the last months.

songwellxie commented 5 years ago

Hey I'm running

python detectron/tools/convert_pkl_to_pb.py --out_dir /app/out --cfg /app/detectron/configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml --device cpu  TEST.WEIGHTS model_final.pkl

and getting the following error.

Traceback (most recent call last):
  File "detectron/tools/convert_pkl_to_pb.py", line 654, in <module>
    main()
  File "detectron/tools/convert_pkl_to_pb.py", line 612, in main
    model, blobs = load_model(args)
  File "detectron/tools/convert_pkl_to_pb.py", line 420, in load_model
    model = test_engine.initialize_model_from_cfg(cfg.TEST.WEIGHTS)
  File "/app/detectron/detectron/core/test_engine.py", line 330, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/app/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/app/pytorch/build/caffe2/python/workspace.py", line 317, in FeedBlob
    return _Workspace_feed_blob(ws, name, arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 654, in _Workspace_feed_blob
    return ws.create_blob(name).feed(arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 676, in _Blob_feed
    return blob._feed(arg, device_option)
RuntimeError: [enforce fail at pybind_state.cc:348] feeder. Unknown device type encountered in FeedBlob.

I've built caffe2 with CPU-only support. Is this going to be a deal breaker? Should I fire up the GPU version and convert to PB with that? Looks like one needs GPU support to convert from pkl to pb. Is this assumption I'm making correct?

Hello, did anyone come across this error? I did when I tried to run CPU-only C3D extraction.

facebookresearch / Detectron

Community effort to bring CPU and pure Caffe2 / C++ inference support #432