About the infer time problem

iMurphL commented 3 years ago

Hi there! Thank you for your excellent codes and it helps me a lot. I trained a network with pytorch and deployed it with tensorRT successfully. But the infer time (do NOT include pre/post process) got longer compared to inferring in torch. While converted to INT8 the model is getting faster but not enough. Is that normal? Maybe there is something I missed while deploying the model. I have no idea about it and can you hint me with any ideas? GPU: GTX1080Ti/CUDA10.0 Model: DeeplabV3Plus with backbone ResNet50 pytorch1.6 infer time 15ms tensorrt infer time 22ms/FP32, 13ms/INT8

Syencil commented 3 years ago

i would also be pleasure to list some hint in english.

按照以下顺序进行排查：

首先确认在测试环境是否干净，比如是否还有其他进程在使用显卡、cpu、磁盘等
如果没问题可以考虑测试之前是否进行了warmup，如果没有可以考虑将前面几次运行事件舍弃再进行统计。
考虑当前TensorRT版本是否和driver还有cudnn版本匹配，是否成功使用了cudnn加速
确认一下模型是否完全一致，是否都fp32.
torch版本模型开发者通常会在infer之前进行fuzeBN，检查一下trt模型是不是相同（trt可以对conv+bn自动合并）
看看torch统计时间和cuda统计时间方式是否相同如果检查完了还有这个问题的话可以再进行留言。

iMurphL commented 3 years ago

Yeah I figured it out with your helps. There were some processes using other GPUs which unexpectedly influenced my tests. And the timing module between python and cpp is different. I rewrite the test code with TorchScript and the time is also around 20ms. Thank you for your helps.

Syencil / tensorRT

About the infer time problem #18