This repository provides a DeepStream sample application based on NVIDIA DeepStream SDK to run eleven TAO models (Faster-RCNN / YoloV3 / YoloV4 / YoloV5 /SSD / DSSD / RetinaNet / UNET/ multi_task/ peopleSemSegNet) with below files:
The pipeline of the sample:
|-->filesink(save the output in local dir)
|--> encode -->
|-->fakesink(use -f option)
uridecoderbin -->streammux-->nvinfer(detection)-->nvosd-->
|--> display
Make sure deepstream-test1 sample can run successful to verify your installation. According to the document, please run below command to install additional audio video packages.
/opt/nvidia/deepstream/deepstream/user_additional_install.sh
sudo apt install libeigen3-dev
cd /usr/include
sudo ln -sf eigen3/Eigen Eigen
sudo apt update
sudo apt install git-lfs
git lfs install --skip-repo
// SSH
git clone git@github.com:NVIDIA-AI-IOT/deepstream_tao_apps.git
// or HTTPS
git clone https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps.git
Run below script to download models except multi_task and YoloV5 models.
sudo ./download_models.sh # (sudo not required in case of docker containers)
For multi_task, refer to https://docs.nvidia.com/tao/tao-toolkit/text/multitask_image_classification.html to train and generate the model.
For yolov5, refer to yolov5_gpu_optimization to generate the onnx model
Note: We deliver new trained SSD/DSSD/FasterRCNN models for the demo purpose with TAO 5.0 release. The output of the new models will not be excatly same as the previous models. For example, you will notice that more cars can be detected in the DeepStream sample video with the new SSD/DSSD.
Please download the TensorRT OSS plugin according to your platforms
x86 platform TRT OSS plugin download instruction
Jetson platform TRT OSS plugin download instruction
The sample provides three inferencing methods. For the TensorRT based gst-nvinfer inferencing, please skip this part.
The DeepStream sample application can work as Triton client with the Triton Inference Server, one of the following two methods can be used to set up the Triton Inference Server before starting a gst-nvinferserver inferncing DeepStream application.
For the TAO sample applications, please enable Triton or Triton gRPC inferencing with the app YAML configurations.
E.G. With apps/tao_detection/ds-tao-detection, the "primary-gie" part in configs/app/det_app_frcnn.yml can be modified as following:
primary-gie:
#0:nvinfer, 1:nvinfeserver
plugin-type: 1
#dssd
#config-file-path: ../nvinfer/dssd_tao/pgie_dssd_tao_config.yml
config-file-path: ../triton/dssd_tao/pgie_dssd_tao_config.yml
#config-file-path: ../triton-grpc/dssd_tao/pgie_dssd_tao_config.yml
And then run the app with the command:
./apps/tao_detection/ds-tao-detection configs/app/det_app_frcnn.yml
export CUDA_MODULE_LOADING=LAZY
export CUDA_VER=xy.z // xy.z is CUDA version, e.g. 10.2
make
1.Usage: ds-tao-detection -c pgie_config_file -i <H264 or JPEG file uri> [-b BATCH] [-d] [-f] [-l]
-h: print help info
-c: pgie config file, e.g. pgie_frcnn_tao_config.txt
-i: uri of the input file, start with the file:///, e.g. file:///.../video.mp4
-b: batch size, this will override the value of "batch-size" in pgie config file
-d: enable display, otherwise it will dump to output MP4 or JPEG file without -f option
-f: use fakesink mode
-l: use loop mode
2.Usage: ds-tao-detection <yaml file uri>
e.g.
./apps/tao_detection/ds-tao-detection configs/app/det_app_frcnn.yml
note: If you want use multi-source, you can input multi -i input(e.g., -i uri -i uri...)
Only YAML configurations support Triton and Triton gRPC inferencing.
For detailed model information, pleasing refer to the following table:
note:
The default $DS_SRC_PATH is /opt/nvidia/deepstream/deepstream
Model Type | Tao Model | Demo |
---|---|---|
detector | dssd, peoplenet_transformer, efficientdet, frcnn, retinanet, retail_detector_binary, ssd, yolov3, yolov4-tiny, yolov4, yolov5 | ./apps/tao_detection/ds-tao-detection -c configs/nvinfer/dssd_tao/pgie_dssd_tao_config.txt -i file:///$DS_SRC_PATH/samples/streams/sample_720p.mp4 or ./apps/tao_detection/ds-tao-detection configs/app/det_app_frcnn.yml |
classifier | multi-task | ./apps/tao_classifier/ds-tao-classifier -c configs/nvinfer/multi_task_tao/pgie_multi_task_tao_config.txt -i file:///$DS_SRC_PATH/samples/streams/sample_720p.mp4 or ./apps/tao_classifier/ds-tao-classifier configs/app/multi_task_app_config.yml |
segmentation | peopleSemSegNet, unet, citySemSegFormer | ./apps/tao_segmentation/ds-tao-segmentation -c configs/nvinfer/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt -i file:///$DS_SRC_PATH/samples/streams/sample_720p.mp4 -w 960 -e 544 or ./apps/tao_segmentation/ds-tao-segmentation configs/app/seg_app_unet.yml |
instance segmentation | Mask2Former | export SHOW_MASK=1; ./apps/tao_detection/ds-tao-detection -c configs/nvinfer/mask2former_tao/pgie_mask2former_tao_config.yml -i file:///$DS_SRC_PATH/samples/streams/sample_720p.mp4 or export SHOW_MASK=1; ./apps/tao_detection/ds-tao-detection configs/app/ins_seg_app.yml |
others | Re-identification, Retail Object Recognition, PoseClassificationNet, OCDNet, OCRNet, LPDNet, LPRNet | refer detailed README for how to configure and run the model |
Building the TensorRT engine of citySemSegFormer consumes a lot of device memory. Please export CUDA_MODULE_LOADING=LAZY
to reduce device memory consumption. Please read CUDA Environment Variables for details.
If you want to do some customization, such as training your own TAO models, running the model in other DeepStream pipeline, you should read below sections.
To download the sample models that we have trained with NVIDIA TAO Toolkit SDK , run wget https://nvidia.box.com/shared/static/w0xxle5b3mjiv20wrq5q37v8u7b3u5tn -O models.zip
Refer TAO Doc for how to train the models, after training finishes, run tao-export
to generate an ONNX model. This ONNX model can be deployed into DeepStream for fast inference as this sample shows.
This DeepStream sample app also supports the TensorRT engine(plan) file generated by running the trtexec
tool on the ONNX model.
The TensorRT engine file is hardware dependent, while the ONNX model is not. You may specify either a TensorRT engine file or a ONNX model in the DeepStream configuration file.
The label file includes the list of class names for a model, which content varies for different models.
User can find the detailed label information for the MODEL in the README.md and the label file under configs/$(MODEL)_tao/, e.g. ssd label informantion under configs/ssd_tao/
Note, for some models like FasterRCNN, DON'T forget to include "background" lable and change num-detected-classes in pgie configure file accordingly
The DeepStream configuration file includes some runtime parameters for DeepStream nvinfer plugin or nvinferserver plugin, such as model path, label file path, TensorRT inference precision, input and output node names, input dimensions and so on.
In this sample, each model has its own DeepStream configuration file, e.g. pgie_dssd_tao_config.txt for DSSD model.
Please refer to DeepStream Development Guide for detailed explanations of those parameters.
The model has the following four outputs:
These three models have the same output layer named NMS which implementation can refer to TRT OSS nmsPlugin:
These model have the following four outputs:
The model has the following two outputs:
These models are trained to extract the embedding vector from an image. The image is the cropped area of a
bounding box from a primary-gie task, like people detection by PeopleNet Transformer
or retail item detection
by Retail Object Detection
. These embedding extraction models are typically arranged
as the secondary GIE module in a Deepstream pipeline.
The output layer is:
embedding_size = 256
.The output layer is:
2048
.The model has the following three outputs:
# 1. Build TensorRT Engine through this smample, for example, build YoloV3 with batch_size=2
./ds-tao -c pgie_yolov3_tao_config.txt -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264 -b 2
## after this is done, it will generate the TRT engine file under models/$(MODEL), e.g. models/yolov3/ for above command.
# 2. Measure the Inference Perf with trtexec, following above example
cd models/yolov3/
trtexec --batch=2 --useSpinWait --loadEngine=yolo_resnet18.etlt_b2_gpu0_fp16.engine
## then you can find the per *BATCH* inference time in the trtexec output log
# The files in the folder are used by TAO dev blogs:
## 1. Training State-Of-The-Art Models for Classification and Object Detection with NVIDIA TAO Toolkit
## 2. Real time vehicle license plate detection and recognition using NVIDIA TAO Toolkit
There are some special models which are not exactly detector, classifier or segmetation. The sample application of these special models are put in apps/tao_others. These samples should run on DeepStream 6.1 or above versions. Please refer to apps/tao_others/README.md document for details.
Some special models needs special deepstream pipeline for running. The deepstream sample graphs for them are put in graphs/tao_others. Please refer to graphs/README.md file for more details.