OAID / Tengine

Tengine is a lite, high performance, modular inference engine for embedded device
Apache License 2.0
4.64k stars 998 forks source link

yolov5 preprocessing SpaceToDepth #640

Open stevenwudi opened 3 years ago

stevenwudi commented 3 years ago

Currently, yolov5 examples put the Focus part of the model (SpaceToDepth) into the preprocessing part. (which actually takes quite a lot of CPU time, 20ms + for 640x640 input).

ONNX (since Opset 13, https://github.com/onnx/onnx/blob/master/docs/Operators.md#spacetodepth), Tengine and TIMVX all have SpaceToDepth support.

Does Tengine have any plan to further incorporate the SpaceToDepth ops into the yolov5 model to further accelerate the detection model?

Looking forward to super-fast implementation of yolov5 models :)

qinhj commented 3 years ago

Hello~ Actually, in my understanding, the SpaceToDepth op can be totally achieved by reshape and transpose Ops Below is a quick test: image

qinhj commented 3 years ago

I've updated the yolov5s optimization python script here

To keep the slice op as it is, one can simply try: $ python3 yolov5s-opt.v2.py --input yolov5s.v5.onnx --output yolov5s.v5.opt.onnx --out_tensor 397,458,519 To replace the slice op by reshape and transpose op, one can try: $ python3 yolov5s-opt.v2.py --input yolov5s.v5.onnx --output yolov5s.v5.opt.onnx --in_tensor 167 --out_tensor 397,458,519

A quick glance at the converted tmfile models: image The output results are exactly same during my test ~

However, with my ubuntu PC, I didn't see too much difference of the performance between these tmfiles. Maybe you can have a try with some other devices (I'm always lack of hardware resources ...) and share me your test results~

qinhj commented 3 years ago

@stevenwudi @BUG1989