About the top-down network

hongsukchoi commented 3 years ago

Hi,

I found your paper very interesting. I just can't wait until the code is released, so ask here.

The paper says that the TD estimates all joints for all person in a bounding box, but GCN&TCN seems to produce one person per one bounding box. Then how do you group or select joints for a person in a bounding box before feeding the joint heatmap to GCN&TCN? Or do you put all joint heatmaps to GCN&TCN? (I don't think it's possible)

Also, BU use the concatenation of joint heatmaps and the input frame as input. But how? Is the channel of input is 3(rgb)+1(heatmap)? There are several potential problems. The number of people in the input frame change which may lead to dynamic input channel, or overlapping joint heatmaps of the same person. Could you give more details about them?

Thank you!

ddddwee1 commented 3 years ago

We first group the instance by ID tag, and then use NMS to select valid heatmaps of each instance.
All heatmaps are mapped back to the size of the original image, so there will only be 3 RGB + 17 keypoint heatmaps

hongsukchoi commented 3 years ago

Thanks for the clarification! Hope to see the codes soon:)

3dpose / 3D-Multi-Person-Pose

About the top-down network #1