PaddlePaddle / PaddleYOLO

🚀🚀🚀 YOLO series of PaddlePaddle implementation, PP-YOLOE+, RT-DETR, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv10, YOLOX, YOLOv5u, YOLOv7u, YOLOv6Lite, RTMDet and so on. 🚀🚀🚀
https://github.com/PaddlePaddle/PaddleYOLO
GNU General Public License v3.0
547 stars 133 forks source link

onnx trt测速问题 #182

Closed laonazzzzz closed 6 months ago

laonazzzzz commented 1 year ago

问题确认 Search before asking

请提出你的问题 Please ask your question

请问说明文档里ppyoloe-s的trt-fp16的耗时是2.9ms,是什么样的环境测试出来的。 我自己按照流程,使用onnx trt测速,测试了很多次,出来的耗时都是3.4ms

trtexec --onnx=onnxfiles/convert/ppyoloe_crn_s_300e_coco.onnx --saveEngine=onnxfiles/convert/ppyoloe_s_bs1.engine --workspace=4096 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16

我自己本地的环境配置: GPU:T4 CUDA:11.2 CuDNN:8.2.0 paddle:2.3.2 TensorRT:8.0.3.4

nemonameless commented 1 year ago
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} exclude_post_process=True trt=True

paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx

/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16

job_name 就是 ppyoloe_crn_s_300e_coco