How to impl zero-copy when send a message out-of-process to cuda ?

lix19937 commented 3 months ago

NOTE:
Two nodes are run at different processes. The ros2 msg memory is on cpu side, and next node(the sub side) want to receive the msg(like camera data), and then accelerate on gpu to infer.

I found, if user want to leverage the benefit of zero-copy in NITROS, all NITROS-accelerated nodes must run in the same process.

ref
https://nvidia-isaac-ros.github.io/concepts/nitros/index.html

ZhenshengLee commented 1 month ago

The design of NITROS makes the following assumptions of the ROS 2 applications: To leverage the benefit of zero-copy in NITROS, all NITROS-accelerated nodes must run in the same process. from https://nvidia-isaac-ros.github.io/concepts/nitros/index.html#system-assumptions

It's clear in isaac-ros docs that intra-process is needed for nitros node, but maintain compatibility in inter-process and normal ros2 nodes. In compatibility mode, the acceleration feature is not available.

NITROS is NVIDIA’s implementation of type adaption and negotiation. from https://nvidia-isaac-ros.github.io/concepts/nitros/index.html#motivation

The root cause of the performance improvement from isaac_ros_nitros is TYPE ADAPTATION https://ros.org/reps/rep-2007.html and TYPE NEGOTIATION https://ros.org/reps/rep-2009.html . Which needs intra-communication.

more info: https://www.openrobotics.org/blog/2022/5/24/ros-2-humble-hawksbill-release https://developer.nvidia.com/blog/improve-perception-performance-for-ros-2-applications-with-nvidia-isaac-transport-for-ros/

ZhenshengLee commented 1 month ago

If you don't use the ros2 transport, you could use nvsci/cuda_ipc to get cuda sharedmemory.

more info: https://github.com/pytorch/pytorch/issues/137680

ZhenshengLee commented 1 month ago

@lix19937 Do you work in China? You could contact me with wechat to discuss: zhensheng_li

NVIDIA-ISAAC-ROS / isaac_ros_nitros

How to impl zero-copy when send a message out-of-process to cuda ? #45