garyzhao / SemGCN

The Pytorch implementation for "Semantic Graph Convolutional Networks for 3D Human Pose Regression" (CVPR 2019).
https://arxiv.org/abs/1904.03345
Apache License 2.0
461 stars 78 forks source link

About input data of GCN #4

Closed ericzw closed 4 years ago

ericzw commented 4 years ago

hi, how to concatenate 2d data(16, 2) with perceptual feature? what's the size of perceptual feature? what's the size of input data for GCN?

garyzhao commented 4 years ago

Hi, thanks for your interest in our work.

Perceptual features are pooled from the intermediate layers of the backbone network.

For example, we pool features from the conv_1 to conv_4 layers from ResNet. Therefore, let C_i be the channel size of the conv_i layer, and the pooled feature size will be (C_1 + C_2 + C_3 + C_4). After concatenated with the joint coordinate (x, y), the final feature size will be (C_1 + C_2 + C_3 + C_4 + 2).

Best, Long

ericzw commented 4 years ago

hi, thanks for your reply. so you meant that the input size is 16(2+c1+... +c5),  and output size is 163? the architecture of SemGCN in your paper shows input size is 16*2. 发自我的华为手机-------- 原始邮件 --------主题:Re: [garyzhao/SemGCN] About input data of GCN (#4)发件人:Long Zhao 收件人:garyzhao/SemGCN 抄送:Zhongwei Qiu 18813059739@163.com,Author Hi, thanks for your interest in our work. Perceptual features are pooled from the intermediate layers of the backbone network. For example, we pool features from the conv_1 to conv_4 layers from ResNet. Therefore, let C_i be the channel size of the conv_i layer, and the pooled feature size will be (C_1 + C_2 + C_3 + C_4). After concatenated with the joint coordinate (x, y), the final feature size will be (C_1 + C_2 + C_3 + C_4 + 2). Best, Long

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.

garyzhao commented 4 years ago

Yes.

162 (as shown in Figure 2) is used for Configuration #1 in our paper which is same in this repo. 16(2+c1+... +c5) is used for Configuration #2 in our paper.

Best, Long

ericzw commented 4 years ago

Yes.

162 (as shown in Figure 2) is used for Configuration #1 in our paper which is same in this repo. 16(2+c1+... +c5) is used for Configuration #2 in our paper.

Best, Long

Hi, I still have a problem. The role of SemGCN is to regress 3d joints coordinate from 2d joints coordinate. Have you tried to replace the SemGCN model by fc layers? That means regressing joints coordinates directly by fc layers. which one is better in your opinion?

garyzhao commented 4 years ago

Hi, that is exactly what Martinez et al. [1] have done. [1] Martinez et al. A simple yet effective baseline for 3d human pose estimation. ICCV 2017.

We have implemented it in 'main_linear.py', and the results are shown in the table of this repo. We also have a detailed discussion in our main paper. Please check them accordingly.

Best, Long