PPYOLO-tiny在Jetson AGX上运行速度问题

PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Apache License 2.0

12.82k stars 2.89k forks source link

PPYOLO-tiny在Jetson AGX上运行速度问题 #4352

Closed lleesg closed 2 years ago

lleesg commented 3 years ago

环境：AGX、Ubuntu18.04、CUDA10.2、cudnn 8.0.0、jetpack 4.4、python3.6、Paddle2.1.2、TRT7.1

代码：导出PPYOLO-tiny模型后，使用https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/python/infer.py进行预测

测试方法：AGX运行在MAX_N模式，视频为720p，设置target size为608，测试https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/python/infer.py#L537：results = detector.predict([frame], FLAGS.threshold)这一句代码的运行耗时，

测试结果：

trt_fp32: 36ms
trt_fp16: 33ms
trt_int8: 34ms ——三者没有明显的区别，即帧率约30fps

问题：

没有查到相关数据，以上测试结果是否符合预期呢？
https://cloud.tencent.com/developer/article/1652975显示YOLOv4-Tiny在同样条件下的帧率为120.5fps，差距怎么这么大？

wangxinxin08 commented 3 years ago

目前我们没有在AGX测试过PP-YOLO tiny的性能，不过我们测试过其他的模型在该硬件上的速度：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.2/deploy/BENCHMARK_INFER.md#2jetson-agx-xavier, 可以看到ssd mobilenet v1的fps已经能达到125FPS左右，PP-YOLO tiny理论上会更快，所以判断测试的fps应该是存在问题的，问题应该在于模型在第一次跑时应该有很严重的launch时间，建议使用PaddleDetection库里面的测试方法加上--run_benchmark=True进行测试，另外，可以看到在jetson agx上的有的模型trt fp32和trt fp16确实没有明显的加速，主要原因应该在于这些模型中有些op因为缺少对应的trt算子并没有跑在trt engine上

paddle-bot-old[bot] commented 2 years ago

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新，将会被关闭，若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。