Vegetebird / MHFormer

[CVPR 2022] MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MIT License
530 stars 85 forks source link

bottom-up or top-down #8

Closed liamsun2019 closed 2 years ago

liamsun2019 commented 2 years ago

Hi author,

Looks like it's a top-down model which needs an extra detector. A naive question is, for single pose 3d estimation, is it possible to just use the raw image/frame as the model input so as to omit the detector and speed up the inference time? Wait for your feedback, thanks.

Vegetebird commented 2 years ago

Hi~One-stage methods need sophisticated architectures with high computation costs. Recently, two-stage methods are more popular due to their efficiency and accuracy. You can refer to the Related Work section in our paper.

liamsun2019 commented 2 years ago

Hi author, Thanks for your prompt reply. I absolutely agree that two-stage outperforms one-stage in most cases. My scenario is that I need to deploy a model to resource contrained device such as some mobile devices. The top-down strategy can hardly achieve the goal since I need an extra detector and even a 2D pose estimator. The flow is long and the final accuracy will be affetected greatly. Moreover, the inference will be very time consuming.

Vegetebird commented 2 years ago

Hi~You can choose a lightweight and fast 2D pose estimator, such as "Lightweight OpenPose" in your mobile device. But the accuracy of 2D pose estimator is important to the final 3D pose accuracy.

liamsun2019 commented 2 years ago

Right, I have the similar idea. In fact, I already deployed "lightweight openpose" on my device. I guess openpose+MHFormer is a feasible strategy for my scenario. Im doing some tests and may have more questions for you. Thanks for your time in advance.

Vegetebird commented 2 years ago

I will be happy to help.