NVIDIA-AI-IOT / trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
MIT License
957 stars 290 forks source link

How to use model with deepstream #32

Open GstreamNovice opened 4 years ago

GstreamNovice commented 4 years ago

Hello, I was curious to see if the pose-estimation model can be used with deepstream.

I believe I have to convert the model to tensorrt, use it as the pgie detection, and then use a second inference engine for the classifier, after that I need to implement some kind of post processing logic in gst-dsexample and finally have nvosd render the skeleton to the frame.

What else would I need to do ? Also should i have to implement the post processing in gstds-example ? If so how should i go about doing that ?

I know I am not the only one looking for an answer to this question. So any help would be greatly appreciated by me and by the community.

jaybdub commented 4 years ago

Hi GstreamNovice,

Thanks for reaching out!

Do you mind sharing a reference to the DeepStream sample you're basing off of? I am less familiar in this area but will try to help how I can.

As for this project, there are a few stages

  1. Neural network execution (GPU, TensorRT). This takes one input binding (the image) and produces two output bindings (the confidence map and part affinity field).
  2. Post processing (CPU). This takes the confidence map and part affinity fields (copied into CPU memory), and produces the object counts, object part mappings, and object part coordinates.

Past that, it depends on the application. In the current examples, both the neural network execution and post processing depend on PyTorch for bindings. That said, I've taken steps to remove this dependency, which may simplify integrating in an application.

Please let me know if you have any questions, or are able to share your use case so I better understand the challenge your facing.

Best, John

ishang3 commented 3 years ago

@GstreamNovice I am working on this same problem, were you successful?

thancaocuong commented 3 years ago

@ishang3 first you need to follow the deepstream apps, for convinience, I think you need to write inference code in C++. It's like Pose plugin for deepstream. Then write the gstream plugin like yolo. So what's your problem? https://github.com/AlexeyAB/deepstream-plugins

ishang3 commented 3 years ago

@thancaocuong the datastructure for trt pose is very different - it is not object detection which has the bounding box coordinates, but it contains 18 keypoints. Also, the preprocessing is a little different too where nvidia has not been transparent in what changes need to be done.

thancaocuong commented 3 years ago

I did implement the preprocess on cuda including convert to RGB, resize, normalize. I will reoganize my code then share with you. Hope it will help

ishang3 commented 3 years ago

@thancaocuong Thank you, I appreciate it very much.

thancaocuong commented 3 years ago

@ishang3 please take a look at nvPreprocess function. I use cuda to normalize RGB image and DMA it to GPU for inference. You also can use my repo for trt_pose, but you need to add post-processing function written in c++ (trt_pose repo). Feel free to ask me if you have any question. posecpp. Also I've written pose estimator as plugin, so you can easy to mapping with yolo plugin to integrate with deepstream

zhink commented 3 years ago

This is example : https://github.com/NVIDIA-AI-IOT/deepstream_pose_estimation