FeiYull / TensorRT-Alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
GNU General Public License v2.0
1.28k stars 198 forks source link

使用ultralytics反而略快,是否有哪裡沒處理好 #29

Closed pigking0126 closed 1 year ago

pigking0126 commented 1 year ago

使用顯卡3060,輸入圖片為640*640的numpy.ndarray

螢幕擷取畫面 (16) 這是使用官方一條龍的,包含轉檔成engine、推理 螢幕擷取畫面 (15) 這是編譯成dll後從python呼叫的,模型轉換、推理都是使用TensorRT-Alpha

兩者模型皆使用fp16 可以看出使用此版本慢於ultralytics的tensorrt 但引用ultralytics實在太臃腫 使用大佬的舒服多了 想請問大佬如果只需要得到物件座標,是否有哪裡可以優化,或是我原本就有哪裡可能沒做好?

FeiYull commented 1 year ago

@pigking0126
How do you calculate code runtime? RTX3060 graphics card performs better than the mobile RTX2070m, the former theory should be faster,The following is the measured performance of RTX2070m:

RTX2070m

yolov8n-b8-1080p-to-640

FeiYull commented 1 year ago

@shancw96 @pigking0126 Could you give him some advice?

FeiYull commented 1 year ago

@pigking0126 A known reason is: it uses multi-threaded technology to copy data to the gpu, while the repo is single-threaded. https://github.com/FeiYull/TensorRT-Alpha/blob/ffa90c0218005703bb52333fd24bb81c768cf28f/utils/yolo.cpp#L151 You need to modify it yourself, it should be easy

shancw96 commented 1 year ago

@shancw96 @pigking0126 Could you give him some advice?

The img batch is cast to 1 in demo c file, maybe this caused that problem? @FeiYull I'm not a pro in this field so what happened when work with gpu i don't know either.😅

pigking0126 commented 1 year ago

@pigking0126 How do you calculate code runtime?

I didn't change time calculating method in both ultralytics' and this repo's. The image use in my code is screenshot by win32api and inference in numpy.ndarray form one by one(screenshot-->inference-->screenshot-->inference) so maybe not because of threading or batchsize? (I'm not sure) Still working on it.

shancw96 commented 1 year ago

https://github.com/FeiYull/TensorRT-Alpha/blob/ffa90c0218005703bb52333fd24bb81c768cf28f/examples/python_with_dll/c_files/pch.cpp#L84-L91

the demo calculation can be found here, emmm.... just copy from yolov8 cpp version, the img batch alway 1, so i remove it.

https://github.com/FeiYull/TensorRT-Alpha/blob/4b50b83e7af896e78b656d26fbb81691e5848399/yolov8/app_yolov8.cpp#L22-L27

pigking0126 commented 1 year ago

It seems to because i use debug x64 to build cause it so slow...... I use release x64 and everything works well. lol 螢幕擷取畫面 (19) Sorry about the time I waste...