3dpose / GnTCN

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos
MIT License
96 stars 13 forks source link

Estimate 3D pose in photo instead of videos #11

Closed KamiCalcium closed 3 years ago

KamiCalcium commented 3 years ago

Hi,

Thanks for the great works.

I wonder if the code for estimating 3D pose in image instead of videos can be released.

3dpose commented 3 years ago

Thanks for your interest in our work.

Out method is a monocular video 3D human pose estimation method, which takes a sequence of frames as input. Temporal information from the input frames are used in our method (e.g., temporal convolutional network (TCN) in top-down network) for better 3D pose estimation.

To your question, one can duplicate an image multiple times and use the duplicated images as input for video pose estimation methods like ours, but since it's just simple image duplication without real temporal information, it is not expected to produce the same quality of 3D pose estimation result as using real video input data.

On the other hand, there exists single-image based 3D pose estimation methods as discussed in the related works section in our main paper, if your input data is image instead of video, you may want to take a look at those methods. Single-image based methods usually use a lifting network to estimate 3D human pose from 2D keypoints input, on the contrary, video based methods like ours use TCN or RNN to estimate 3D human pose from a sequence of 2D keypoints from multiple frames. Video based methods may not be best suited for single image input data as there is no real temporal information from a single image.

KamiCalcium commented 3 years ago

Thanks for answering. Yeah I have read the paper. And I notice that you use GCNs for the pose estimation. That's why I'm asking. Can I just use your GCNs for estimating the 3D pose from a input image? Will that result be promising too?

3dpose commented 3 years ago

If you plan to use the GCNs part of the framework instead of the whole one, then the answer is yes. As stated in the paper, the proposed GCNs work on a frame-by-frame basis, and allow the more reliably estimated joints/bones to influence the unreliable ones caused by missing information (e.g., occlusion, partially out-of-frame, etc.). Because of this, if integrating the GCNs part into your approach, it can help to better handle the images with missing information issue like partially out-of-frame and occlusion cases.