dusty-nv / ros_deep_learning

Deep learning inference nodes for ROS / ROS2 with support for NVIDIA Jetson and TensorRT
862 stars 258 forks source link

Performance drop in ROS node vs. standalone execution of detectnet mobilenet-ssd-v2 #137

Open ashishbhatti opened 3 months ago

ashishbhatti commented 3 months ago

Description: I am experiencing a significant performance drop when running the mobilenet-ssd-v2 model with a detectnet ROS node compared to standalone execution. The FPS drops by approximately two-thirds, which is unexpected given that the model and its computational load remain unchanged.

Performance Details:

Environment:

Expected Behavior: The FPS should be comparable between the ROS node and standalone executions since the model's computational requirements do not change.

Steps to Reproduce:

  1. Run as ROS Node

    $ git clone --recursive --depth=1 https://github.com/dusty-nv/jetson-inference
    $ cd jetson-inference
    $ docker/run.sh --ros=noetic
    $ roscore
    $ roslaunch ros_deep_learning video_viewer.ros1.launch input:=v4l2:///dev/video0 output:=display://0
  2. Run standalone

    $ docker/run.sh
    $ cd build/aarch64/bin
    $ ./detectnet /dev/video0

Additional Information: I have attached screenshots demonstrating the FPS in both scenarios. | normal | ros |

I am seeking insights or suggestions that could explain the cause of this performance drop and how it might be resolved. Any help would be greatly appreciated.

dusty-nv commented 3 months ago

Hi @ashishbhatti, sorry about that, I no longer have a setup for running it on the versions you specify, however my initial guess is that is related to inefficient image transport of video stream topics in Noetic. I think the primary difference with detectnet/detectnet.py examples is that the images are captured with zero-copy and into CUDA memory.

I remember exploring the use of ROS nodelets (for the imageNet classification models in that case) to work around this, where it all resides inside one process then. If you don't need the camera imagery in other nodes, you could explore just creating a wrapper node that both captures the camera and does detectNet inferencing inside the same node, alleviating the issue.