This package contains DNN inference nodes and camera/video streaming nodes for ROS/ROS2 with support for NVIDIA Jetson Nano / TX1 / TX2 / Xavier / Orin devices and TensorRT.
The nodes use the image recognition, object detection, and semantic segmentation DNN's from the jetson-inference
library and NVIDIA Hello AI World tutorial, which come with several built-in pretrained networks for classification, detection, and segmentation and the ability to load customized user-trained models.
The camera & video streaming nodes support the following input/output interfaces:
Various distribution of ROS are supported either from source or through containers (including Melodic, Noetic, Foxy, Galactic, Humble, and Iron). The same branch supports both ROS1 and ROS2.
The easiest way to get up and running is by cloning jetson-inference (which ros_deep_learning is a submodule of) and running the pre-built container, which automatically mounts the required model directories and devices:
$ git clone --recursive --depth=1 https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ docker/run.sh --ros=humble # noetic, foxy, galactic, humble, iron
note: the ros_deep_learning nodes rely on data from the jetson-inference tree for storing models, so clone and mount
jetson-inference/data
if you're using your own container or source installation method.
The --ros
argument to the docker/run.sh
script selects the ROS distro to use. They in turn use the ros:$ROS_DISTRO-pytorch
container images from jetson-containers, which include jetson-inference and this.
For previous information about building the ros_deep_learning package for an uncontainerized ROS installation, expand the section below (the parts about installing ROS may require adapting for the particular version of ROS/ROS2 that you want to install)
Before proceeding, if you're using ROS Melodic make sure that roscore
is running first:
$ roscore
If you're using ROS2, running the core service is no longer required.
First, it's recommended to test that you can stream a video feed using the video_source
and video_output
nodes. See Camera Streaming & Multimedia for valid input/output streams, and substitute your desired input
and output
argument below. For example, you can use video files for the input or output, or use V4L2 cameras instead of MIPI CSI cameras. You can also use RTP/RTSP streams over the network.
# ROS
$ roslaunch ros_deep_learning video_viewer.ros1.launch input:=csi://0 output:=display://0
# ROS2
$ ros2 launch ros_deep_learning video_viewer.ros2.launch input:=csi://0 output:=display://0
You can launch a classification demo with the following commands - substitute your desired camera or video path to the input
argument below (see here for valid input/output streams).
Note that the imagenet
node also publishes classification metadata on the imagenet/classification
topic in a vision_msgs/Detection2DArray
message -- see the Topics & Parameters section below for more info.
# ROS
$ roslaunch ros_deep_learning imagenet.ros1.launch input:=csi://0 output:=display://0
# ROS2
$ ros2 launch ros_deep_learning imagenet.ros2.launch input:=csi://0 output:=display://0
To launch an object detection demo, substitute your desired camera or video path to the input
argument below (see here for valid input/output streams). Note that the detectnet
node also publishes the metadata in a vision_msgs/Detection2DArray
message -- see the Topics & Parameters section below for more info.
# ROS
$ roslaunch ros_deep_learning detectnet.ros1.launch input:=csi://0 output:=display://0
# ROS2
$ ros2 launch ros_deep_learning detectnet.ros2.launch input:=csi://0 output:=display://0
To launch a semantic segmentation demo, substitute your desired camera or video path to the input
argument below (see here for valid input/output streams). Note that the segnet
node also publishes raw segmentation results to the segnet/class_mask
topic -- see the Topics & Parameters section below for more info.
# ROS
$ roslaunch ros_deep_learning segnet.ros1.launch input:=csi://0 output:=display://0
# ROS2
$ ros2 launch ros_deep_learning segnet.ros2.launch input:=csi://0 output:=display://0
Below are the message topics and parameters that each node implements.
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
classification | Output | vision_msgs/Classification2D |
Classification results (class ID + confidence) |
vision_info | Output | vision_msgs/VisionInfo |
Vision metadata (class labels parameter list name) |
overlay | Output | sensor_msgs/Image |
Input image overlayed with the classification results |
Parameter Name | Type | Default | Description |
---|---|---|---|
model_name | string |
"googlenet" |
Built-in model name (see here for valid values) |
model_path | string |
"" |
Path to custom caffe or ONNX model |
prototxt_path | string |
"" |
Path to custom caffe prototxt file |
input_blob | string |
"data" |
Name of DNN input layer |
output_blob | string |
"prob" |
Name of DNN output layer |
class_labels_path | string |
"" |
Path to custom class labels file |
class_labels_HASH | vector<string> |
class names | List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
detections | Output | vision_msgs/Detection2DArray |
Detection results (bounding boxes, class IDs, confidences) |
vision_info | Output | vision_msgs/VisionInfo |
Vision metadata (class labels parameter list name) |
overlay | Output | sensor_msgs/Image |
Input image overlayed with the detection results |
Parameter Name | Type | Default | Description |
---|---|---|---|
model_name | string |
"ssd-mobilenet-v2" |
Built-in model name (see here for valid values) |
model_path | string |
"" |
Path to custom caffe or ONNX model |
prototxt_path | string |
"" |
Path to custom caffe prototxt file |
input_blob | string |
"data" |
Name of DNN input layer |
output_cvg | string |
"coverage" |
Name of DNN output layer (coverage/scores) |
output_bbox | string |
"bboxes" |
Name of DNN output layer (bounding boxes) |
class_labels_path | string |
"" |
Path to custom class labels file |
class_labels_HASH | vector<string> |
class names | List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic) |
overlay_flags | string |
"box,labels,conf" |
Flags used to generate the overlay (some combination of none,box,labels,conf ) |
mean_pixel_value | float |
0.0 | Mean pixel subtraction value to be applied to input (normally 0) |
threshold | float |
0.5 | Minimum confidence value for positive detections (0.0 - 1.0) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
vision_info | Output | vision_msgs/VisionInfo |
Vision metadata (class labels parameter list name) |
overlay | Output | sensor_msgs/Image |
Input image overlayed with the classification results |
color_mask | Output | sensor_msgs/Image |
Colorized segmentation class mask out |
class_mask | Output | sensor_msgs/Image |
8-bit single-channel image where each pixel is a classID |
Parameter Name | Type | Default | Description |
---|---|---|---|
model_name | string |
"fcn-resnet18-cityscapes-1024x512" |
Built-in model name (see here for valid values) |
model_path | string |
"" |
Path to custom caffe or ONNX model |
prototxt_path | string |
"" |
Path to custom caffe prototxt file |
input_blob | string |
"data" |
Name of DNN input layer |
output_blob | string |
"score_fr_21classes" |
Name of DNN output layer |
class_colors_path | string |
"" |
Path to custom class colors file |
class_labels_path | string |
"" |
Path to custom class labels file |
class_labels_HASH | vector<string> |
class names | List of class labels, where HASH is model-specific (actual name of parameter is found via the vision_info topic) |
mask_filter | string |
"linear" |
Filtering to apply to color_mask topic (linear or point ) |
overlay_filter | string |
"linear" |
Filtering to apply to overlay topic (linear or point ) |
overlay_alpha | float |
180.0 |
Alpha blending value used by overlay topic (0.0 - 255.0) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
raw | Output | sensor_msgs/Image |
Raw output image (BGR8) |
Parameter | Type | Default | Description |
---|---|---|---|
resource | string |
"csi://0" |
Input stream URI (see here for valid protocols) |
codec | string |
"" |
Manually specify codec for compressed streams (see here for valid values) |
width | int |
0 | Manually specify desired width of stream (0 = stream default) |
height | int |
0 | Manually specify desired height of stream (0 = stream default) |
framerate | int |
0 | Manually specify desired framerate of stream (0 = stream default) |
loop | int |
0 | For video files: 0 = don't loop, >0 = # of loops, -1 = loop forever |
flip | string |
"" |
Set the flip method for MIPI CSI cameras (see here for valid values) |
Topic Name | I/O | Message Type | Description |
---|---|---|---|
image_in | Input | sensor_msgs/Image |
Raw input image |
Parameter | Type | Default | Description |
---|---|---|---|
resource | string |
"display://0" |
Output stream URI (see here for valid protocols) |
codec | string |
"h264" |
Codec used for compressed streams (see here for valid values) |
bitrate | int |
4000000 | Target VBR bitrate of encoded streams (in bits per second) |