facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.5k stars 7.48k forks source link

[torchscript Deployment] PointRend deploy failed in tracing #2776

Closed wadesunyang closed 3 years ago

wadesunyang commented 3 years ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

  1. Full runnable code or full changes you made:

https://github.com/facebookresearch/detectron2/blob/master/tools/deploy/export_model.py since i use torch 1.7.1, i modify the function export_tracing(torch_model, inputs) :

assert TORCH_VERSION >= (1, 8) to assert TORCH_VERSION >= (1, 7)
  1. What exact command you run:
    ./export_model.py --config-file ../../projects/PointRend/configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_1x_coco.yaml --output ./output --export-method tracing --format torchscript MODEL.WEIGHTS detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_1x_coco/164254221/model_final_736f5a.pkl MODEL.DEVICE cpu

    convertion is success(with some warnings). But when I run the following commad.

    ./build/torchscript_traced_mask_rcnn output/model.ts 1.png tracing

    I get the following runtime error.

  2. Full logs or other relevant observations:
    
    terminate called after throwing an instance of 'std::runtime_error'
    what():  The following operation failed in the TorchScript interpreter.
    Traceback of TorchScript, serialized code (most recent call last):
    File "code/__torch__/detectron2/export/flatten.py", line 30, in forward
    _14, _15, _16, _17, _18, _19, _20, _21, _22, _23, = (_2).forward(x0, )
    _24 = (_1).forward(_14, _15, _16, _17, _18, _19, _20, _21, _22, image_size, )
    _25 = (_0).forward(_14, _24, _15, _16, _23, image_size, )
           ~~~~~~~~~~~ <--- HERE
    _26, _27, _28, _29, = _25
    return (_26, _27, _28, _29, image_size)
    File "code/__torch__/detectron2/modeling/roi_heads/roi_heads.py", line 128, in forward
    _63 = torch.slice(filter_inds0, 0, 0, 9223372036854775807, 1)
    classes = torch.select(_63, 1, 1)
    _64 = (_1).forward(tensor2, argument_1, classes, )
           ~~~~~~~~~~~ <--- HERE
    _65, _66, = _64
    return (tensor2, _65, _66, _61)
    File "code/__torch__/detectron2/projects/point_rend/mask_head.py", line 38, in forward
    _18 = torch.slice(point_coords_wrt_image, 0, 0, 9223372036854775807, 1)
    _19 = torch.slice(_18, 1, 0, 9223372036854775807, 1)
    _20 = torch.copy_(torch.select(_19, 2, 0), torch.view(_17, [100, 196]), False)
                                               ~~~~~~~~~~ <--- HERE
    _21 = torch.slice(point_coords_wrt_image, 0, 0, 9223372036854775807, 1)
    _22 = torch.slice(_21, 1, 0, 9223372036854775807, 1)

Traceback of TorchScript, original code (most recent call last): /home/project/detectron2/projects/PointRend/point_rend/point_features.py(218): get_point_coords_wrt_image /home/project/detectron2/projects/PointRend/point_rend/point_features.py(177): point_sample_fine_grained_features /home/project/detectron2/projects/PointRend/point_rend/mask_head.py(202): _forward_mask_coarse /home/project/detectron2/projects/PointRend/point_rend/mask_head.py(187): forward /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(725): _call_impl /home/project/detectron2/detectron2/modeling/roi_heads/roi_heads.py(839): _forward_mask /home/project/detectron2/detectron2/modeling/roi_heads/roi_heads.py(769): forward_with_given_boxes /home/project/detectron2/detectron2/modeling/roi_heads/roi_heads.py(743): forward /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(725): _call_impl /home/project/detectron2/detectron2/modeling/meta_arch/rcnn.py(212): inference ./export_model.py(91): inference /home/project/detectron2/detectron2/export/flatten.py(257): forward /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py(725): _call_impl /opt/conda/lib/python3.6/site-packages/torch/jit/_trace.py(940): trace_module /opt/conda/lib/python3.6/site-packages/torch/jit/_trace.py(742): trace ./export_model.py(100): export_tracing ./export_model.py(176): RuntimeError: shape '[100, 196]' is invalid for input of size 1176

已放弃 (核心已转储)


## Expected behavior:

I was expecting to run torchscript_traced_mask_rcnn successfully.

## Environment:

Paste the output of the following command:

sys.platform linux Python 3.6.9 Anaconda, Inc. (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] numpy 1.17.2 detectron2 0.4 @/home/project/detectron2/detectron2 Compiler GCC 5.4 CUDA compiler CUDA 10.1 detectron2 arch flags 7.5 DETECTRON2_ENV_MODULE PyTorch 1.7.1 @/opt/conda/lib/python3.6/site-packages/torch PyTorch debug build False GPU available True GPU 0,1 GeForce RTX 2080 Ti (arch=7.5) CUDA_HOME /usr/local/cuda Pillow 8.0.1 torchvision 0.8.2 @/opt/conda/lib/python3.6/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5 fvcore 0.1.3.post20210317 cv2 4.2.0

PyTorch built with:

Can you please help to fix the issue? (By the way, mask_rcnn example can deploy success, but the PointRend is not ok .) Thank in Advance.

ppwwyyxx commented 3 years ago

As the script says it does not support pytorch 1.7...