Syencil / tensorRT

TensorRT-7 Network Lib 包括常用目标检测、关键点检测、人脸检测、OCR等 可训练自己数据
520 stars 111 forks source link

About the infer time problem #18

Closed iMurphL closed 3 years ago

iMurphL commented 3 years ago

Hi there! Thank you for your excellent codes and it helps me a lot. I trained a network with pytorch and deployed it with tensorRT successfully. But the infer time (do NOT include pre/post process) got longer compared to inferring in torch. While converted to INT8 the model is getting faster but not enough. Is that normal? Maybe there is something I missed while deploying the model. I have no idea about it and can you hint me with any ideas? GPU: GTX1080Ti/CUDA10.0 Model: DeeplabV3Plus with backbone ResNet50 pytorch1.6 infer time 15ms tensorrt infer time 22ms/FP32, 13ms/INT8

Syencil commented 3 years ago

i would also be pleasure to list some hint in english.

按照以下顺序进行排查:

  1. 首先确认在测试环境是否干净,比如是否还有其他进程在使用显卡、cpu、磁盘等
  2. 如果没问题可以考虑测试之前是否进行了warmup,如果没有可以考虑将前面几次运行事件舍弃再进行统计。
  3. 考虑当前TensorRT版本是否和driver还有cudnn版本匹配,是否成功使用了cudnn加速
  4. 确认一下模型是否完全一致,是否都fp32.
  5. torch版本模型开发者通常会在infer之前进行fuzeBN,检查一下trt模型是不是相同(trt可以对conv+bn自动合并)
  6. 看看torch统计时间和cuda统计时间方式是否相同 如果检查完了还有这个问题的话可以再进行留言。
iMurphL commented 3 years ago

Yeah I figured it out with your helps. There were some processes using other GPUs which unexpectedly influenced my tests. And the timing module between python and cpp is different. I rewrite the test code with TorchScript and the time is also around 20ms. Thank you for your helps.