NVIDIA-AI-IOT / trt_pose

Real-time pose estimation accelerated with NVIDIA TensorRT
MIT License
973 stars 291 forks source link

Seeking clarification about resnet18/densenet121 and the original papers #116

Open ghost opened 3 years ago

ghost commented 3 years ago

Hi,

I read the paper "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" (https://arxiv.org/pdf/1611.08050.pdf) and it does not mention about resnet18 or densenet121. I was wondering why this implementation needs pre-trained resnet18 or densenet121 model?

Or is this implementation based on the paper "Simple Baselines for Human Pose Estimation and Tracking" (https://arxiv.org/pdf/1804.06208.pdf). The code seems to use the idea of Part Affinity Fields as mentioned in the first paper.

Can you please clarify?

Thanks for the great work! Tareq

jaybdub commented 3 years ago

Hi @tareq992403 ,

Thanks for reaching out!

You’re correct! This project is largely a combination of the ideas in each paper, with some other additional architecture tricks applied on the head of the model.

Please let me know if this helps or you have any questions.

Best, John

ghost commented 3 years ago

Hi @jaybdub,

Thanks for the quick reply. As the code does not have detailed comments, can please share any documentation or blog on this implementation? This will help us to understand the code better.

Thanks, Tareq

ghost commented 3 years ago

After the backbone pretrained RenNet18, do you feed the features to the two-branch multi-stage CNN for heatmap and PAF generation as shown in Fig. 3 of "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" (https://arxiv.org/pdf/1611.08050.pdf)?

I am confused. Can anyone help? I saw several requests about the explanation of this architecture, so this will help many of us.

Thanks, Tareq

tucachmo2202 commented 3 years ago

@tareq992403 you should read this article https://www.geeksforgeeks.org/openpose-human-pose-estimation-method/

ghost commented 3 years ago

@tucachmo2202 Thx for the link. The tutorial you shared is on explaining the paper "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" (https://arxiv.org/pdf/1611.08050.pdf)."

However this trt_pose GitHub implementation is DIFFERENT from this paper. As mentioned by the author of this GitHub John in this thread "This project is largely a combination of the ideas in each paper, with some other additional architecture tricks applied on the head of the model."

This work is a mixture of "Simple Baselines for Human Pose Estimation and Tracking" (https://arxiv.org/pdf/1804.06208.pdf) and "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" (https://arxiv.org/pdf/1611.08050.pdf)."

I am having hard-time understanding this implementation. After the backbone pretrained ResNet18, I don't see the two-branch multi-stage CNN for heatmap and PAF generation in the code. Can anyone help?

tucachmo2202 commented 3 years ago

@tareq992403 you should see common.py file image and resnet.py. The model return two-branch.

ghost commented 3 years ago

@tucachmo2202 Thanks, it is making more sense.

Here my understanding: We load the pre-trained model weights to the backbone resnet18. On top of the backbone resnet18, there are two CNNs to generate heatmap and PAF.

My question is where are we TRAINING these two CNNs that are on top of the backbone resnet18?

Thanks.

igormusinov commented 3 years ago

Hi @tareq992403. OpenPose released a newer paper https://arxiv.org/abs/1812.08008. Are there any advantages of the two-branch approach ((https://arxiv.org/pdf/1611.08050.pdf) over the one branch (https://arxiv.org/abs/1812.08008)?