Performance drop in ROS node vs. standalone execution of detectnet mobilenet-ssd-v2

Description: I am experiencing a significant performance drop when running the mobilenet-ssd-v2 model with a detectnet ROS node compared to standalone execution. The FPS drops by approximately two-thirds, which is unexpected given that the model and its computational load remain unchanged.

Performance Details:

Standalone FPS: Approximately 24 FPS
ROS Node FPS: Approximately 8 FPS

Environment:

Model: mobilenet-ssd-v2 with detectnet
Platform: NVIDIA Jetson Nano
Software: jetson-inference docker container, ROS Noetic

Expected Behavior: The FPS should be comparable between the ROS node and standalone executions since the model's computational requirements do not change.

Steps to Reproduce:

Run as ROS Node

$ git clone --recursive --depth=1 https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ docker/run.sh --ros=noetic
$ roscore
$ roslaunch ros_deep_learning video_viewer.ros1.launch input:=v4l2:///dev/video0 output:=display://0

Run standalone

$ docker/run.sh
$ cd build/aarch64/bin
$ ./detectnet /dev/video0

Additional Information: I have attached screenshots demonstrating the FPS in both scenarios. | normal | ros |

I am seeking insights or suggestions that could explain the cause of this performance drop and how it might be resolved. Any help would be greatly appreciated.

Hi @ashishbhatti, sorry about that, I no longer have a setup for running it on the versions you specify, however my initial guess is that is related to inefficient image transport of video stream topics in Noetic. I think the primary difference with detectnet/detectnet.py examples is that the images are captured with zero-copy and into CUDA memory.

I remember exploring the use of ROS nodelets (for the imageNet classification models in that case) to work around this, where it all resides inside one process then. If you don't need the camera imagery in other nodes, you could explore just creating a wrapper node that both captures the camera and does detectNet inferencing inside the same node, alleviating the issue.

dusty-nv / ros_deep_learning

Performance drop in ROS node vs. standalone execution of detectnet mobilenet-ssd-v2 #137