Deepstream-YOLO-Pose

YOLO-Pose accelerated with TensorRT and multi-streaming with Deepstream SDK

System Requirements

Python 3.8
- Should be already installed with Ubuntu 20.04
Ubuntu 20.04
CUDA 11.4 (Jetson)
TensorRT 8+

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

JetPack 5.1.1 / 5.1
NVIDIA DeepStream SDK
- Download and install from https://developer.nvidia.com/deepstream-download
DeepStream-Yolo

Deepstream Python Biding

Deepstream Python Biding

Gst-python and GstRtspServer

Installing GstRtspServer and introspection typelib
```
sudo apt update
sudo apt install python3-gi python3-dev python3-gst-1.0 -y
sudo apt-get install libgstrtspserver-1.0-0 gstreamer1.0-rtsp
```
For gst-rtsp-server (and other GStreamer stuff) to be accessible in Python through gi.require_version(), it needs to be built with gobject-introspection enabled (libgstrtspserver-1.0-0 is already). Yet, we need to install the introspection typelib package:
```
sudo apt-get install libgirepository1.0-dev
sudo apt-get install gobject-introspection gir1.2-gst-rtsp-server-1.0
```
Prepare YOLO-Pose Model

YOLO-pose architecture

source : YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss
[ ] YOLOv7
- Gwencong/yolov7-pose-tensorrt
- nanmi/yolov7-pose
- support single batch only
- Some problems with /YoloLayer_TRT_v7.0/build/libyolo.so
  - The detection box is not synchronized with the screen on Jetson
[x] YOLOv8

Prepare YOLOv8 TensorRT Engine

Choose yolov8-pose for better operator optimization of ONNX model
Base on triple-Mu/YOLOv8-TensorRT/Pose.md

The yolov8-pose model conversion route is : YOLOv8 PyTorch model -> ONNX -> TensorRT Engine

Notice !!! :warning: This repository don't support TensorRT API building !!!

0. Get `yolov8s-pose.pt`

https://github.com/ultralytics/ultralytics

Benchmark of YOLOv8-Pose

See [Pose Docs](https://docs.ultralytics.com/tasks/pose) for usage examples with these models. | Model | size
^{(pixels) | mAP^{pose
50-95 | mAP^{pose
50 | Speed
^{CPU ONNX
(ms) | Speed
^{A100 TensorRT
(ms) | params
^{(M) | FLOPs
^{(B) |
| ---------------------------------------------------------------------------------------------------- | --------------------- | --------------------- | ------------------ | ------------------------------ | ----------------------------------- | ------------------ | ----------------- |
| [YOLOv8n-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-pose.pt) | 640 | 50.4 | 80.1 | 131.8 | 1.18 | 3.3 | 9.2 |
| [YOLOv8s-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt) | 640 | 60.0 | 86.2 | 233.2 | 1.42 | 11.6 | 30.2 |
| [YOLOv8m-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m-pose.pt) | 640 | 65.0 | 88.8 | 456.3 | 2.00 | 26.4 | 81.0 |
| [YOLOv8l-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l-pose.pt) | 640 | 67.6 | 90.0 | 784.5 | 2.59 | 44.4 | 168.6 |
| [YOLOv8x-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x-pose.pt) | 640 | 69.2 | 90.2 | 1607.1 | 3.73 | 69.4 | 263.2 |
| [YOLOv8x-pose-p6](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x-pose-p6.pt) | 1280 | 71.6 | 91.2 | 4088.7 | 10.04 | 99.1 | 1066.4 |

- **mAP^val** values are for single-model single-scale on [COCO Keypoints val2017](http://cocodataset.org)
dataset.

Reproduce by `yolo val pose data=coco-pose.yaml device=0`
- **Speed** averaged over COCO val images using an [Amazon EC2 P4d](https://aws.amazon.com/ec2/instance-types/p4/) instance.

Reproduce by `yolo val pose data=coco8-pose.yaml batch=1 device=0|cpu`

- Source : [ultralytics](https://github.com/ultralytics/ultralytics)}}}}}}}

wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt

1. Pytorch Model to Onnx Model

Export Orin ONNX model by ultralytics You can leave this repo and use the original ultralytics repo for onnx export.
CLI tools(yolo command from "ultralytics.com")
- Recommended in your server to get faster speed :zap:
- ref : ultralytics.com/modes/export
- Usage(after pip3 install ultralytics):
```
yolo export model=yolov8s-pose.pt format=onnx device=0 \
          imgsz=640 \
          dynamic=true \
          simplify=true
```
After executing the above command, you will get an engine named yolov8s-pose.onnx too.

Move your Onnx Model to egdge device in specific path

put model on your edge device


sudo chmod u+rwx -R /opt/nvidia/deepstream/deepstream/samples/models # Add Write and execute permissions 
sudo mkdir -p tao_pretrained_models/YOLOv8-TensorRT 
sudo chmod u+rwx -R tao_pretrained_models/YOLOv8-TensorRT

mv -v /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT/yolov8s-pose-dy-sim-640.onnx

[Optional] Execute `netron yolov8s-pose.onnx` to view the model architecture

Check Model Ouputs
- Note that the number of anchors for YOLOv8-Pose is 56
- bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56
- The number of anchors of YOLOv7-Pose is 57
- bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
Model registration information of YOLOv8S-Pose
- INPUTS : (batch, channel, height, width)
- OUTPUTS : (batch, anchors, max_outpus)

2. Onnx to TensorRT Engine with dynamic_batch

:warning: Must be bound to a hardware device, please put it on your edge device(It's a long wait :hourglass:)

Specify parameters such as -minShapes --optShapes --maxShapes to set dynamic batch processing.

cd /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT 
sudo /usr/src/tensorrt/bin/trtexec --verbose \
  --onnx=yolov8s-pose-dy-sim-640.onnx \
  --fp16 \
  --workspace=4096 \
  --minShapes=images:1x3x640x640 \
  --optShapes=images:12x3x640x640 \
  --maxShapes=images:16x3x640x640 \
  --saveEngine=yolov8s-pose-dy-sim-640.engine

3. Test and Check Tensortrt Engine

/usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s-pose-dy.engine

or test with multi batch for dynamic shaped onnx model
- --shapes=spec Set input shapes for dynamic shapes inference inputs.
```
/usr/src/tensorrt/bin/trtexec  \
--loadEngine=yolov8s-pose-dy-sim-640.engine \
--shapes=images:12x3x640x640 
```
- Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine

model	device	size	batch	fps	ms
yolov8s-pose.engine	AGX Xavier	640	1	40.6	24.7
yolov8s-pose.engine	AGX Xavier	640	12	12.1	86.4
yolov8s-pose.engine	AGX Orin	640	1	258.8	4.2
yolov8s-pose.engine	AGX Orin	640	12	34.8	33.2
yolov7w-pose.engine*	AGX Xavier	960	1	19.0	52.1
yolov7w-pose.engine*	AGX Orin	960	1	61.1	16.8
yolov7w-pose.pt	AGX Xavier	960	1	14.4	59.8
yolov7w-pose.pt	AGX Xavier	960	1	11.8	69.4

* yolov7w-pose with yolo layer tensorrt plugin from (nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
test .engine(TensorRT) model with trtexec command.
test .pt model with Pytorch (with 15s video) for baseline.
NMS not included in all test

Basic usage

Download Ripository

git clone https://github.com/YunghuiHsu/deepstream-yolo-pose.git

To run the app with default settings:

NVInfer with rtsp inputs

 python3 deepstream_YOLOv8-Pose_rtsp.py \ 
    -i  rtsp://sample_1.mp4 \
        rtsp://sample_2.mp4 \ 
        rtsp://sample_N.mp4  \

eg: loop with local file inputs

python3 deepstream_YOLOv8-Pose_rtsp.py \
  -i file:///home/ubuntu/video1.mp4 file:///home/ubuntu/video2.mp4 \
  -config dstest1_pgie_YOLOv8-Pose_config.txt \
  --file-loop

Default RTSP streaming location:
- rtsp://<server IP>:8554/ds-test
- VLC Player on client suggested(Camera Streaming and Multimedia)
Note: 1) if -g/--pgie : uses nvinfer as default. (['nvinfer', 'nvinferserver']). 2) -config/--config-file : need to be provided for custom models. 3) --file-loop : option can be used to loop input files after EOS. 4) --conf-thres : Objec Confidence Threshold 5) --iou-thres : IOU Threshold for NMS

This sample app is derived from NVIDIA-AI-IOT/deepstream_python_apps/apps and adds customization features
Includes following :
- [x] Accepts multiple sources
- [x] Dynamic batch model(YOLO-POSE)
- [x] Accepts RTSP stream as input and gives out inference as RTSP stream
- [x] NVInfer GPU inference engine
- [ ] NVInferserver GPU inference engine(Not yet tested)
- [x] MultiObjectTracker(NVTracker)
- [x] Automatically adjusts the tensor shape of the loaded input and output (NvDsInferTensorMeta)
- [x] Extract the stream metadata, ~~image data~~ from the batched buffer of Gst-nvinfer
  
  source : deepstream-imagedata-multistream
  
  Acknowledgements
- YOLOv5
- YOLOv7
- YOLOv8
- TexasInstruments/edgeai-yolov5
- triple-Mu/YOLOv8-TensorRT
- marcoslucianops/DeepStream-Yolo
- Gwencong/yolov7-pose-tensorrt
- nanmi/yolov7-pose

YunghuiHsu / deepstream-yolo-pose

readme

Deepstream-YOLO-Pose

System Requirements

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

Deepstream Python Biding

Gst-python and GstRtspServer

Prepare YOLO-Pose Model

Prepare YOLOv8 TensorRT Engine

0. Get `yolov8s-pose.pt`

1. Pytorch Model to Onnx Model

[Optional] Execute `netron yolov8s-pose.onnx` to view the model architecture

2. Onnx to TensorRT Engine with dynamic_batch

3. Test and Check Tensortrt Engine

Basic usage

Download Ripository

To run the app with default settings:

source : deepstream-imagedata-multistream

Acknowledgements

Reference

NVIDIA DeepStream SDK API Reference/NvDsInferTensorMeta Struct Reference

DEEPSTREAM PYTHON API REFERENCE/NvDsInfer

Using a Custom Model with DeepStream

YunghuiHsu / deepstream-yolo-pose

readme

Deepstream-YOLO-Pose

System Requirements

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

Deepstream Python Biding

Gst-python and GstRtspServer

Prepare YOLO-Pose Model

Prepare YOLOv8 TensorRT Engine

0. Get yolov8s-pose.pt

1. Pytorch Model to Onnx Model

[Optional] Execute netron yolov8s-pose.onnx to view the model architecture

2. Onnx to TensorRT Engine with dynamic_batch

3. Test and Check Tensortrt Engine

Basic usage

Download Ripository

To run the app with default settings:

source : deepstream-imagedata-multistream

Acknowledgements

Reference

NVIDIA DeepStream SDK API Reference/NvDsInferTensorMeta Struct Reference

DEEPSTREAM PYTHON API REFERENCE/NvDsInfer

Using a Custom Model with DeepStream

0. Get `yolov8s-pose.pt`

[Optional] Execute `netron yolov8s-pose.onnx` to view the model architecture