使用paddleinference将ppyoloe部署到jetson nano 4g的qt端时，使用tensorrt加速出错，提示GPU memory不足

PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Apache License 2.0

12.38k stars 2.84k forks source link

使用paddleinference将ppyoloe部署到jetson nano 4g的qt端时，使用tensorrt加速出错，提示GPU memory不足 #6545

Closed shangshanruowo closed 1 year ago

shangshanruowo commented 1 year ago

问题确认 Search before asking

[X] 我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

我的jetson nano安装的版本如下： cuda：10.2 cudnn：8.2 tensorrt：8.2.1.8 下载的inference推理库也是匹配的，其版本信息如下：首先我在qt中使用cuda和cudnn加速推理ppyoloe模型是可以的，但使用tensorrt加速时就会提示gpu memory不足，这是我的模型设置：我试了config.EnableUseGPU和config.EnableTensorRtEngine函数中的各种参数组合，但都在加载模型中提示gpu memory不足，下面是它的报错

我用jtop看了跑的过程中的内存情况，Mem中4g都占满了，Swp中设置了10GB加载模型时差不多用了2GB左右，我在你们paddleinference中的文档中看到了jetson的教程，里面设置和我的差不多，还有在你们paddleinference中的文档里面的链接好多打不开。

zhiboniu commented 1 year ago

config.EnableUseGpu设置的数值放大一些呢。如果还是不行，尝试把batchsize设置位1，缩小输入尺寸都尝试一下，看看是否gpu memory能满足要求

shangshanruowo commented 1 year ago

我重新试过了，config.EnableUseGpu设置放大了，那个模型的输入也调成了416*416的，还是tensorrt加速的时候不行，是不是ppyoloe_s还是有点大的原因，看ppyoloe_s的Params是7.93m，网上成功用jetson nano部署yolov5加速成功的，而yolov5_n的参数量是1.9m，并且你们在paddleinference上提供jetson部署用tensorrt加速的示例用的也是参数量不大的mobilenetv1模型，我之前也没使用过tensorrt加速，麻烦你们看一下。

lyuwenyu commented 1 year ago

其他模型也是这个情况嘛你试一下picodet-s能不能跑通这个模型更小先判断一下是模型的问题还是设置 https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/picodet

Jiangsiping commented 1 year ago

你好，请问你TensorRT装的是哪个版本啊，我装的是TensorRT 8.2 GA for ARM SBSA，但是需要依赖cuda11以上，我没找到依赖cuda10.2版本的TensorRT8.2。我看其它版本都是针对X86架构的。

shangshanruowo commented 1 year ago

这是我转的版本，这些都是我买开发板时商家送的镜像文件里就有的

shangshanruowo commented 1 year ago

那个我ppyoloe_s用tensorrt加速成功了，是把config.EnableUseGpu里面的参数减小，每次用tensorrt加载模型报的显存不足是剩余的显存<config.EnableUseGpu这个申请的，我把config.EnableUseGpu里面的值减小到50就行了，但还有一个问题，加载tensorrt模型时太久了，都快10分钟一次了，有啥办法不用一直加载吗？这样太耽误使用了。

Jiangsiping commented 1 year ago

我看这个问题和你类似https://github.com/PaddlePaddle/PaddleDetection/issues/6480，请问你tensorrt加速后pploloe-s大概能达到多少帧率啊

shangshanruowo commented 1 year ago

感谢，之前看到过这个issues，后面找的时候我没找到。我用c++部署应该将config.enable_tensorrt_engine里面的use_static = True就可以，我试试看

shangshanruowo commented 1 year ago

刚刚试了一下，确实有效，第一次加载后在build模型文件里面生成了一个序列信息，后面加载就和之前加载模型一样了，可用了tensorrt加速后实际检测速度不是很快，7-9帧的样子