YunghuiHsu / deepstream-yolo-pose

Use Deepstream python API to extract the model output tensor and customize the post-processing of YOLO-Pose
https://hackmd.io/JQAXmJzuTyW22-x3k-0Zvw
Apache License 2.0
57 stars 14 forks source link
deeplearning deepstream detection edge jetson nvidia pose pose-detection pose-estimation tensorrt yolo yolo-pose yolov8

Deepstream-YOLO-Pose

Multistream_4_YOLOv8s-pose-3.PNG

YOLO-Pose accelerated with TensorRT and multi-streaming with Deepstream SDK

Build Status Python Version img img img

System Requirements

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

Deepstream Python Biding

Gst-python and GstRtspServer

Prepare YOLOv8 TensorRT Engine

0. Get yolov8s-pose.pt

https://github.com/ultralytics/ultralytics

Benchmark of YOLOv8-Pose See [Pose Docs](https://docs.ultralytics.com/tasks/pose) for usage examples with these models. | Model | size
(pixels) | mAPpose
50-95 | mAPpose
50 | Speed
CPU ONNX
(ms) | Speed
A100 TensorRT
(ms) | params
(M) | FLOPs
(B) | | ---------------------------------------------------------------------------------------------------- | --------------------- | --------------------- | ------------------ | ------------------------------ | ----------------------------------- | ------------------ | ----------------- | | [YOLOv8n-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n-pose.pt) | 640 | 50.4 | 80.1 | 131.8 | 1.18 | 3.3 | 9.2 | | [YOLOv8s-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt) | 640 | 60.0 | 86.2 | 233.2 | 1.42 | 11.6 | 30.2 | | [YOLOv8m-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8m-pose.pt) | 640 | 65.0 | 88.8 | 456.3 | 2.00 | 26.4 | 81.0 | | [YOLOv8l-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8l-pose.pt) | 640 | 67.6 | 90.0 | 784.5 | 2.59 | 44.4 | 168.6 | | [YOLOv8x-pose](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x-pose.pt) | 640 | 69.2 | 90.2 | 1607.1 | 3.73 | 69.4 | 263.2 | | [YOLOv8x-pose-p6](https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8x-pose-p6.pt) | 1280 | 71.6 | 91.2 | 4088.7 | 10.04 | 99.1 | 1066.4 | - **mAPval** values are for single-model single-scale on [COCO Keypoints val2017](http://cocodataset.org) dataset.
Reproduce by `yolo val pose data=coco-pose.yaml device=0` - **Speed** averaged over COCO val images using an [Amazon EC2 P4d](https://aws.amazon.com/ec2/instance-types/p4/) instance.
Reproduce by `yolo val pose data=coco8-pose.yaml batch=1 device=0|cpu` - Source : [ultralytics](https://github.com/ultralytics/ultralytics)
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt

1. Pytorch Model to Onnx Model

[Optional] Execute netron yolov8s-pose.onnx to view the model architecture

netron_yolov8s-pose_dy-sim-640_onnx.PNG

2. Onnx to TensorRT Engine with dynamic_batch

3. Test and Check Tensortrt Engine

/usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s-pose-dy.engine
model device size batch fps ms
yolov8s-pose.engine AGX Xavier 640 1 40.6 24.7
yolov8s-pose.engine AGX Xavier 640 12 12.1 86.4
yolov8s-pose.engine AGX Orin 640 1 258.8 4.2
yolov8s-pose.engine AGX Orin 640 12 34.8 33.2
yolov7w-pose.engine* AGX Xavier 960 1 19.0 52.1
yolov7w-pose.engine* AGX Orin 960 1 61.1 16.8
yolov7w-pose.pt AGX Xavier 960 1 14.4 59.8
yolov7w-pose.pt AGX Xavier 960 1 11.8 69.4

Basic usage

Download Ripository

git clone https://github.com/YunghuiHsu/deepstream-yolo-pose.git

To run the app with default settings:


Reference

NVIDIA DeepStream SDK API Reference/NvDsInferTensorMeta Struct Reference

DEEPSTREAM PYTHON API REFERENCE/NvDsInfer

Using a Custom Model with DeepStream