TrojanXu / yolov5-tensorrt

A tensorrt implementation of yolov5: https://github.com/ultralytics/yolov5
Apache License 2.0
190 stars 46 forks source link

yolov5m and yolov5l ? #6

Closed batrlatom closed 4 years ago

batrlatom commented 4 years ago

Hi, what are correct yaml files for medium and large models? Exporting small model is ok.

I have tried with medium model, but getting error: My yaml for medium is:

` nc: 80 # number of classes depth_multiple: 0.67 # model depth multiple width_multiple: 0.75 # layer channel multiple

anchors:

backbone: [[-1, 1, Focus, [64, 3]], # 1-P1/2 [-1, 1, Conv, [128, 3, 2]], # 2-P2/4 [-1, 3, Bottleneck, [128]], [-1, 1, Conv, [256, 3, 2]], # 4-P3/8 [-1, 9, BottleneckCSP, [256]], [-1, 1, Conv, [512, 3, 2]], # 6-P4/16 [-1, 9, BottleneckCSP, [512]], [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32 [-1, 1, SPP, [1024, [5, 9, 13]]], [-1, 6, BottleneckCSP, [1024]], # 10 ]

head: [[-1, 3, BottleneckCSP, [1024, False]], # 11 [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large)

[-2, 1, Upsample, [None, 2, 'nearest']], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 1, Conv, [512, 1, 1]], [-1, 3, BottleneckCSP, [512, False]], [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium)

[-2, 1, Upsample, [None, 2, 'nearest']], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 1, Conv, [256, 1, 1]], [-1, 3, BottleneckCSP, [256, False]], [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small)

[[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ]`

Error I am getting is:

Traceback (most recent call last): File "export.py", line 59, in <module> simplify_onnx(onnx_path) File "export.py", line 44, in simplify_onnx model_simp, check = simplify(model) File "/home/tom/miniconda3/envs/yolact/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py", line 311, in simplify res = forward_for_node_outputs(model, const_nodes, input_shapes=input_shapes) File "/home/tom/miniconda3/envs/yolact/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py", line 160, in forward_for_node_outputs res = forward(model, input_shapes=input_shapes) File "/home/tom/miniconda3/envs/yolact/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py", line 144, in forward sess = rt.InferenceSession(model.SerializeToString(), sess_options=sess_options, providers=['CPUExecutionProvider']) File "/home/tom/miniconda3/envs/yolact/lib/python3.7/site-packages/onnxruntime/capi/session.py", line 158, in __init__ self._load_model(providers) File "/home/tom/miniconda3/envs/yolact/lib/python3.7/site-packages/onnxruntime/capi/session.py", line 177, in _load_model self._sess.load_model(providers) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Shape of initializer 1099 does not match. {} != {96}

TrojanXu commented 4 years ago

From my test, you only need to replace the yaml and pt file used in main.py and replace nn.Upsample by Upsample in corresponding yaml file. It's supposed to work in your case.

batrlatom commented 4 years ago

This really does not work for any other model than yolov5s . Would you be so kind and try it with any other model? Yolov5 repo is changing very rapidly, this could be the source of the problem. But I think that make it compatible at least with version 1.0 would be great. Again, thanks for your work!

TrojanXu commented 4 years ago

You can have a try on latest commit.

batrlatom commented 4 years ago

I have tried it, but unfortunately I am getting Segmentation fault (core dumped) error in the step of onnx simplifier ( pytorch 1.4.0, torchvision 0.5.0, onnx 1.6.0, onnx-simplifier 0.2.9, trt 7.0.0.11 )

last few lines of text dumped from the app:

  %960 : Tensor = onnx::Unsqueeze[axes=[0]](%855)
  %961 : Tensor = onnx::Unsqueeze[axes=[0]](%958)
  %962 : Tensor = onnx::Unsqueeze[axes=[0]](%959)
  %963 : Tensor = onnx::Concat[axis=0](%960, %961, %962)
  %964 : Float(1, 15360, 2) = onnx::Reshape(%957, %963) # /Development/yolov5-tensorrt/yolo.py:41:0
  %965 : Float(1, 15360, 6) = onnx::Concat[axis=-1](%951, %964) # /Development/yolov5-tensorrt/yolo.py:43:0
  %prediction : Float(1, 20160, 6) = onnx::Concat[axis=1](%739, %852, %965) # /Development/yolov5-tensorrt/yolo.py:45:0
  return (%prediction)

When I try to make the same with newer libs ( pytorch 1.5.1, torchvision 0.6.1, onnx-simplifier 0.2.10, trt 7.0.0.11, onnx 1.7.0 ), it goes through, but throw an error:

  %959 : Float(1, 3, 80, 80, 2) = onnx::Slice(%916, %956, %957, %955, %958) # /Development/yolov5-tensorrt/yolo.py:41:0
  %960 : Float(1, 3, 80, 80, 2) = onnx::Cast[to=1](%959) # /Development/yolov5-tensorrt/yolo.py:41:0
  %963 : Tensor = onnx::Unsqueeze[axes=[0]](%857)
  %966 : Tensor = onnx::Concat[axis=0](%963, %998, %999)
  %967 : Float(1, 19200, 2) = onnx::Reshape(%960, %966) # /Development/yolov5-tensorrt/yolo.py:41:0
  %968 : Float(1, 19200, 6) = onnx::Concat[axis=-1](%954, %967) # /Development/yolov5-tensorrt/yolo.py:43:0
  %prediction : Float(1, 25200, 6) = onnx::Concat[axis=1](%740, %854, %968) # /Development/yolov5-tensorrt/yolo.py:45:0
  return (%prediction)

model loaded
Traceback (most recent call last):
  File "main.py", line 370, in <module>
    simplify_onnx(onnx_path)
  File "main.py", line 155, in simplify_onnx
    model_simp, check = simplify(model)
  File "/usr/local/lib/python3.6/dist-packages/onnxsim/onnx_simplifier.py", line 311, in simplify
    res = forward_for_node_outputs(model, const_nodes, input_shapes=input_shapes)
  File "/usr/local/lib/python3.6/dist-packages/onnxsim/onnx_simplifier.py", line 160, in forward_for_node_outputs
    res = forward(model, input_shapes=input_shapes)
  File "/usr/local/lib/python3.6/dist-packages/onnxsim/onnx_simplifier.py", line 144, in forward
    sess = rt.InferenceSession(model.SerializeToString(), sess_options=sess_options, providers=['CPUExecutionProvider'])
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/session.py", line 158, in __init__
    self._load_model(providers)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/session.py", line 177, in _load_model
    self._sess.load_model(providers)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Data in initializer '992' has element type tensor(float) but usage of initializer in graph expects tensor(int64)

Do you know where could be the problem? What about providing some docker container so everything would be repeatable ( yolov5 repo have this now ) ?

TrojanXu commented 4 years ago

It's a good idea to provider a docker container, I will add it later. But for now, I think you can either try to use onnx-sim from CLI directly rather than in main.py to see whether it works or try to run main.py in pytorch:20.01-py3 docker container from ngc.nvidia.com.

batrlatom commented 4 years ago

I have tried the CLI onnx-simplifier and also tried to simplify it in a separate script. The script itself is

import onnx
from onnxsim import simplify

model = onnx.load('yolov5_1.onnx')
model_simp, check = simplify(model, skip_fuse_bn=False, input_shapes={'data': [1, 3, 640, 640]})

I think that there is a problem with the fusing bn. When skip_fuse_bn=True, the code goes through. When skip_fuse_bn=False, I am getting error even the input shape is defined onnx.onnx_cpp2py_export.checker.ValidationError: Graph must be in single static assignment (SSA) form, however '1388' has been used as output names multiple times.

Nevertheles, when I simplify onnx model without the fusion_bn, your code works, but I do not see any speed improvement even your first commit made speedups like 3-4 times.

Is the whole speedup done only by the fusion_bn ?

batrlatom commented 4 years ago

Ok, I have tried it on V100 with skipping fuse_bn and it works quite well. Speedup is significant

tienhoang1094 commented 4 years ago

hi, can u share inference trt code @batrlatom

TrojanXu commented 4 years ago

closed according to comment https://github.com/TrojanXu/yolov5-tensorrt/issues/6#issuecomment-654086631