I found your paper very interesting. I just can't wait until the code is released, so ask here.
The paper says that the TD estimates all joints for all person in a bounding box, but GCN&TCN seems to produce one person per one bounding box. Then how do you group or select joints for a person in a bounding box before feeding the joint heatmap to GCN&TCN? Or do you put all joint heatmaps to GCN&TCN? (I don't think it's possible)
Also, BU use the concatenation of joint heatmaps and the input frame as input. But how? Is the channel of input is 3(rgb)+1(heatmap)? There are several potential problems. The number of people in the input frame change which may lead to dynamic input channel, or overlapping joint heatmaps of the same person. Could you give more details about them?
Hi,
I found your paper very interesting. I just can't wait until the code is released, so ask here.
The paper says that the TD estimates all joints for all person in a bounding box, but GCN&TCN seems to produce one person per one bounding box. Then how do you group or select joints for a person in a bounding box before feeding the joint heatmap to GCN&TCN? Or do you put all joint heatmaps to GCN&TCN? (I don't think it's possible)
Also, BU use the concatenation of joint heatmaps and the input frame as input. But how? Is the channel of input is 3(rgb)+1(heatmap)? There are several potential problems. The number of people in the input frame change which may lead to dynamic input channel, or overlapping joint heatmaps of the same person. Could you give more details about them?
Thank you!