Nice project, but there are some error, can you help me?

Byronnar commented 3 years ago

when I export the swin_tiny model(download from github) to onnx , meet the error below:

Traceback (most recent call last):
  File "export2onnx.py", line 30, in <module>
    opset_version=args.opset, enable_onnx_checker=True)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py", line 230, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 91, in export
    use_external_data_format=use_external_data_format)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 639, in _export
    dynamic_axes=dynamic_axes)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 421, in _model_to_graph
    dynamic_axes=dynamic_axes, input_names=input_names)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 203, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py", line 263, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 930, in _run_symbolic_function
    symbolic_fn = _find_symbolic_in_registry(domain, op_name, opset_version, operator_export_type)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 888, in _find_symbolic_in_registry
    return sym_registry.get_registered_op(op_name, domain, opset_version)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_registry.py", line 111, in get_registered_op
    raise RuntimeError(msg)
RuntimeError: Exporting the operator roll to ONNX opset version 12 is not supported. Please open a bug to request ONNX export support for the missing operator.

Could you please give me some advices? Thank you!!!

feiyuhuahuo commented 3 years ago

The operator roll in swin-transformer has not been supported by PyTorch, just wait the update. Or you can clone the master branch and get the resnet101 weight. It can be exported to ONNX successfully.

Byronnar commented 3 years ago

The operator roll in swin-transformer has not been supported by PyTorch, just wait the update. Or you can clone the master branch and get the resnet101 weight. It can be exported to ONNX successfully.

Thank you! I try to transfer the resnet101 model, there are also some errors, as below:

------------------------------res101_coco------------------------------
mode: detect
cuda: True
gpu_id: 0
img_size: 544
class_names: ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')
num_classes: 81
scales: [24, 48, 96, 192, 384]
aspect_ratios: [1, 0.5, 2]
weight: weights/best_30.5_res101_coco_392000.pth
traditional_nms: False
nms_score_thre: 0.05
nms_iou_thre: 0.5
top_k: 200
max_detections: 100
opset: 12

Traceback (most recent call last):
  File "export2onnx.py", line 25, in <module>
    net.load_weights(cfg.weight, cfg.cuda)
  File "/home/nvidia/byronnar/Yolact_transformer_practice/modules/yolact.py", line 129, in load_weights
    self.load_state_dict(state_dict, strict=True)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Yolact:
        Missing key(s) in state_dict: "fpn.pred_layers.0.0.weight", "fpn.pred_layers.0.0.bias", "fpn.pred_layers.1.0.weight", "fpn.pred_layers.1.0.bias", "fpn.pred_layers.2.0.weight", "fpn.pred_layers.2.0.bias", "fpn.downsample_layers.0.0.weight", "fpn.downsample_layers.0.0.bias", "fpn.downsample_layers.1.0.weight", "fpn.downsample_layers.1.0.bias", "proto_net.proto1.0.weight", "proto_net.proto1.0.bias", "proto_net.proto1.2.weight", "proto_net.proto1.2.bias", "proto_net.proto1.4.weight", "proto_net.proto1.4.bias", "proto_net.proto2.0.weight", "proto_net.proto2.0.bias", "proto_net.proto2.2.weight", "proto_net.proto2.2.bias", "prediction_layers.upfeature.0.weight", "prediction_layers.upfeature.0.bias", "prediction_layers.bbox_layer.weight", "prediction_layers.bbox_layer.bias", "prediction_layers.conf_layer.weight", "prediction_layers.conf_layer.bias", "prediction_layers.coef_layer.0.weight", "prediction_layers.coef_layer.0.bias".
        Unexpected key(s) in state_dict: "fpn.pred_layers.0.weight", "fpn.pred_layers.0.bias", "fpn.pred_layers.1.weight", "fpn.pred_layers.1.bias", "fpn.pred_layers.2.weight", "fpn.pred_layers.2.bias", "fpn.downsample_layers.0.weight", "fpn.downsample_layers.0.bias", "fpn.downsample_layers.1.weight", "fpn.downsample_layers.1.bias", "proto_net.0.weight", "proto_net.0.bias", "proto_net.2.weight", "proto_net.2.bias", "proto_net.4.weight", "proto_net.4.bias", "proto_net.8.weight", "proto_net.8.bias", "proto_net.10.weight", "proto_net.10.bias", "prediction_layers.0.upfeature.0.weight", "prediction_layers.0.upfeature.0.bias", "prediction_layers.0.bbox_layer.weight", "prediction_layers.0.bbox_layer.bias", "prediction_layers.0.conf_layer.weight", "prediction_layers.0.conf_layer.bias", "prediction_layers.0.mask_layer.weight", "prediction_layers.0.mask_layer.bias".
        size mismatch for fpn.lat_layers.0.weight: copying a param with shape torch.Size([256, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).    
        size mismatch for fpn.lat_layers.2.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 2048, 1, 1]).

What should I modify?

The operator roll in swin-transformer has not been supported by PyTorch, just wait the update. Or you can clone the master branch and get the resnet101 weight. It can be exported to ONNX successfully.

I use the code of master and export onnx,trt successfully, But when I detect_with_trt, the error occuered:

------------------------------res101_coco------------------------------
mode: detect
cuda: True
gpu_id: 0
img_size: 550
class_names: ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')
num_classes: 81
scales: [24, 48, 96, 192, 384]
aspect_ratios: [1, 0.5, 2]
weight: trt_files/res101_coco.trt
traditional_nms: False
nms_score_thre: 0.05
nms_iou_thre: 0.5
top_k: 200
max_detections: 100
image: ../test_images/
video: None
hide_mask: False
hide_bbox: False
hide_score: False
cutout: False
save_lincomb: False
no_crop: False
real_time: False
visual_thre: 0.3

Traceback (most recent call last):
  File "detect_with_trt.py", line 129, in <module>
    proto_p = results[2].reshape(1, int(cfg.img_size / 4), int(cfg.img_size / 4), 32)
ValueError: cannot reshape array of size 609408 into shape (1,137,137,32)

Thank you!

feiyuhuahuo commented 3 years ago

That's strange. Could you check the output shape of your ONNX file with Netron?

And what's your pytorch version? Note that, you should use the total master branch code, because the codes of the two branches are incompatible.

Byronnar commented 3 years ago

Thank yoo, I have use the total master branch code and modify this line: proto_p = results[2].reshape(1, int(cfg.img_size / 4 ), int(cfg.img_size / 4 ), 32) to


proto_p = results[2].reshape(1, int(cfg.img_size / 4 +1), int(cfg.img_size / 4 + 1), 32)
```, this problem has been solved!
But, the memory usage is too big! The resnet50_coco need3.5G, how can I reduce the memory usage?  
Looking forward to your reply!

feiyuhuahuo commented 3 years ago

I made a mistake. I thought you were doing ONNX detection. I have no idea on reducing memory usage. I'm also new to tensorRT.

Byronnar commented 3 years ago

I made a mistake. I thought you were doing ONNX detection. I have no idea on reducing memory usage. I'm also new to tensorRT.

Thank you, I will try.

feiyuhuahuo / Yolact_minimal

Nice project, but there are some error, can you help me? #31