Closed pigking0126 closed 1 year ago
@pigking0126
How do you calculate code runtime?
RTX3060 graphics card performs better than the mobile RTX2070m, the former theory should be faster,The following is the measured performance of RTX2070m:
@shancw96 @pigking0126 Could you give him some advice?
@pigking0126 A known reason is: it uses multi-threaded technology to copy data to the gpu, while the repo is single-threaded. https://github.com/FeiYull/TensorRT-Alpha/blob/ffa90c0218005703bb52333fd24bb81c768cf28f/utils/yolo.cpp#L151 You need to modify it yourself, it should be easy
@shancw96 @pigking0126 Could you give him some advice?
The img batch is cast to 1 in demo c file, maybe this caused that problem? @FeiYull I'm not a pro in this field so what happened when work with gpu i don't know either.😅
@pigking0126 How do you calculate code runtime?
I didn't change time calculating method in both ultralytics' and this repo's. The image use in my code is screenshot by win32api and inference in numpy.ndarray form one by one(screenshot-->inference-->screenshot-->inference) so maybe not because of threading or batchsize? (I'm not sure) Still working on it.
the demo calculation can be found here, emmm.... just copy from yolov8 cpp version, the img batch alway 1, so i remove it.
It seems to because i use debug x64 to build cause it so slow...... I use release x64 and everything works well. lol Sorry about the time I waste...
使用顯卡3060,輸入圖片為640*640的numpy.ndarray
這是使用官方一條龍的,包含轉檔成engine、推理 這是編譯成dll後從python呼叫的,模型轉換、推理都是使用TensorRT-Alpha
兩者模型皆使用fp16 可以看出使用此版本慢於ultralytics的tensorrt 但引用ultralytics實在太臃腫 使用大佬的舒服多了 想請問大佬如果只需要得到物件座標,是否有哪裡可以優化,或是我原本就有哪裡可能沒做好?