使用ultralytics反而略快，是否有哪裡沒處理好

FeiYull / TensorRT-Alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎

GNU General Public License v2.0

1.28k stars 198 forks source link

使用ultralytics反而略快，是否有哪裡沒處理好 #29

Closed pigking0126 closed 1 year ago

pigking0126 commented 1 year ago

使用顯卡3060，輸入圖片為640*640的numpy.ndarray

螢幕擷取畫面 (16) 這是使用官方一條龍的，包含轉檔成engine、推理螢幕擷取畫面 (15) 這是編譯成dll後從python呼叫的，模型轉換、推理都是使用TensorRT-Alpha

兩者模型皆使用fp16 可以看出使用此版本慢於ultralytics的tensorrt 但引用ultralytics實在太臃腫使用大佬的舒服多了想請問大佬如果只需要得到物件座標，是否有哪裡可以優化，或是我原本就有哪裡可能沒做好?

FeiYull commented 1 year ago

@pigking0126
How do you calculate code runtime? RTX3060 graphics card performs better than the mobile RTX2070m, the former theory should be faster，The following is the measured performance of RTX2070m：

RTX2070m

yolov8n-b8-1080p-to-640

FeiYull commented 1 year ago

@shancw96 @pigking0126 Could you give him some advice？

FeiYull commented 1 year ago

@pigking0126 A known reason is: it uses multi-threaded technology to copy data to the gpu, while the repo is single-threaded. https://github.com/FeiYull/TensorRT-Alpha/blob/ffa90c0218005703bb52333fd24bb81c768cf28f/utils/yolo.cpp#L151 You need to modify it yourself, it should be easy

shancw96 commented 1 year ago

@shancw96 @pigking0126 Could you give him some advice？

The img batch is cast to 1 in demo c file, maybe this caused that problem? @FeiYull I'm not a pro in this field so what happened when work with gpu i don't know either.😅

pigking0126 commented 1 year ago

@pigking0126 How do you calculate code runtime?

I didn't change time calculating method in both ultralytics' and this repo's. The image use in my code is screenshot by win32api and inference in numpy.ndarray form one by one(screenshot-->inference-->screenshot-->inference) so maybe not because of threading or batchsize? (I'm not sure) Still working on it.

shancw96 commented 1 year ago

https://github.com/FeiYull/TensorRT-Alpha/blob/ffa90c0218005703bb52333fd24bb81c768cf28f/examples/python_with_dll/c_files/pch.cpp#L84-L91

the demo calculation can be found here, emmm.... just copy from yolov8 cpp version, the img batch alway 1, so i remove it.

https://github.com/FeiYull/TensorRT-Alpha/blob/4b50b83e7af896e78b656d26fbb81691e5848399/yolov8/app_yolov8.cpp#L22-L27

pigking0126 commented 1 year ago

It seems to because i use debug x64 to build cause it so slow...... I use release x64 and everything works well. lol 螢幕擷取畫面 (19) Sorry about the time I waste...