Closed vtddggg closed 2 years ago
@bobo0810 Thank you for your reply. what hardware and TensorRT versions do you use?
@bobo0810 Thank you for your reply. what hardware and TensorRT versions do you use?
@bobo0810 Thank you for your reply. what hardware and TensorRT versions do you use?
- RTX2080Ti
- TensorRT7.2.2-1
- cuda11.1
Oh, looks I have a different version of TensorRT and cuda. and which opset version you used for onnx export?
@bobo0810 Thank you for your reply. what hardware and TensorRT versions do you use?
- RTX2080Ti
- TensorRT7.2.2-1
- cuda11.1
Oh, looks I have a different version of TensorRT and cuda. and which opset version you used for onnx export?
@bobo0810 Thank you for your reply. what hardware and TensorRT versions do you use?
- RTX2080Ti
- TensorRT7.2.2-1
- cuda11.1
Oh, looks I have a different version of TensorRT and cuda. and which opset version you used for onnx export?
- onnx 1.8.0
- onnxruntime 1.7.0
- torch.onnx.export(opset_version=12)
👍, I am going to reproduce the results with these requirements you provided.
@bobo0810 did you use batch=1 for your trt inference speed test or did you use a higher number?
@bobo0810 did you use batch=1 for your trt inference speed test or did you use a higher number?
batch=1
ok thanks. Did you use FP16 too?
I struggle to get any performance improvement with tensorrt with my 980ti with trt7.2.3. I appreciate the hardware is different from your but I wonder why there is such a performance challenge on my end. On my side the trt performance is worse than the regular repo code... (but I can still here my GPU fans so it is used a bit). Maybe I did not modify the code appropriately.
Any chance you can the trt code you used for your benchmark numbers?
Any chance you can the trt code you used for your benchmark numbers?
https://github.com/deepcam-cn/yolov5-face/issues/76#issue-1022279361 Similar to the second code, remember to warm up before testing
What do you mean by warm up? If that relates to the slow initial inference yes I observe it, the code hangs up for like 10-15seconds the first time, but then even after that it takes about 250ms per frame inference (that includes pre-processing, inference and post processing, but most of the time is due to inference). This is at par or even smower than the std implementation. There is something I am lilely doing wrong since I know tensorrt speed up other repo I have worked with.
What do you mean by warm up? If that relates to the slow initial inference yes I observe it, the code hangs up for like 10-15seconds the first time, but then even after that it takes about 250ms per frame inference (that includes pre-processing, inference and post processing, but most of the time is due to inference). This is at par or even smower than the std implementation. There is something I am lilely doing wrong since I know tensorrt speed up other repo I have worked with.
https://github.com/deepcam-cn/yolov5-face/issues/76#issuecomment-939733533 So, it may be due to the hardware.
I will test on a rtx3090 soon, will share how that went.
@bobo0810 Really thanks for your TensorRT inference implementation!! There are some questions after successfully running the TensorRT of Yolov5-face:
The results in table look very impressive. But in my case, I test the RT time on 2080ti GPU after running following two codes:
This code gives the
RT for one image: 6 ms
.This code gives the
RT for one image: 11 ms
. Is such a test of RT time right in my understanding ?It seems
yolo_trt_model.after_process
cost much time. Why not put this process into TensorRT, by uncommenting this line ? I find in original yolov5 repo, the overall model can be exported by this file. Is it possible to put the entire process of Yolov5-face into TenserRT?The results in the table only consider the
yolo_trt_model.__call__
running time, or bothyolo_trt_model.__call__
,yolo_trt_model.after_process
andnon_max_suppression_face
are considered ?