Fail to reproduce the officially downloaded yolov5s.onnx model when using yolov5s.pt from scratch

Background

I have been making efforts to export quantized tmfile format yolov5s model from a pretrained yolov5s.pt, and to deploy it on edge device with a311d.

　　The officially released v4 yolov5s.onnx downloaded from https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx works correctly on a311d folloing the given demo of tmfile conversion and quantization.

　　However, when mannully exporting onnx format model from a pretrained yolov5s.pt and processing it in the same manner, the tmfile after quantization failed to detect any object on a311d, with the code normally running and no bug report.

Mismatch between 2 onnx models

The onnx model exported from pretrained yolov5s.pt seems to be slightly different from that downloaded from https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx. The left one corresponds to the officially released onnx model, and the right counterpart refers to one manually exported from yolov5s.pt. Acturaly, their size also differ: the space of the downloaded one is 29.7M, whearas that of the manually exported one is 29.1M.

After processing it via the yolov5s-opt.py script, the comparison between the obtained 2 models looks as

Their topological structure of network are approximately identical. It seems that the main difference lies in the shape of output tensor: the officially released onnx after opt operation output 3 tensors of 25200x80, while the other one 1 x 3 x 80 x 80 x 85. The shape of the 2 types of tensor even couldn't match each other, i.e., 1x3x80x80x85 != 25200x80

How can I reproduce the official released onnx model result from pretrained yolov5s.pt model ?

Since the the official released onnx model works well to obtain expected detection result, if the identical onnx model can be exactly exported from the plain .pt pretrained one, then the problem may be solved. But, what is the solution ?

The last question: Will Tengine add official support for converting to quantized tmfile from a finetuned yolov5s.pt model ?

The code based on existing avaiable onnx model from https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.onnx may not be elastic enough for user-finetuned yolov5 cases. I failed when using yolov5s.pt from scratch. Have Tengine research group ever considered developing a code version designed for the special case where we start with a yolov5s.pt model ?

Plus: My exported float 32 tmfile from pretrained yolov5s.pt model predicts correctly using the compiled executable program tm_yolov5s.

Could anybody provide me a couple of useful tips for copying with this problem ?

OAID / Tengine