Question about ground truth heatmap & embedding size

Hi. First of all, thank you for sharing your great work.

I have two questions about your proposed method described in paper.

In Eq (1), L_2D is calculated as mean square error between heatmaps. But how could one can get ground truth heatmap? As far as I understand, heatmaps are the probability-encoded data for joints, and usually it is a byproduct(?) of pose estimation.

I think you may assume for instance normal distribution around ground truth 3D joints and project it to generate ground truth heatmap. And I guess your approach is similar as I described, since you mentioned in the paper "... the 3D lifting module can be trained independently using 3D mocap data and its projected heatmaps." (p.7732, 5.Architecture second paragraph)

I wonder If my guess is right and whether is it okay to use fixed size(standard deviation) to generate ground truth heatmaps.

Did you conduct a comparative study on embeddings dimension? I think it's quite small(20D).

I wonder if it has serious impact on performance and/or inference speed if I change the dimension of embeddings.

facebookresearch / xR-EgoPose

Question about ground truth heatmap & embedding size #5