Interpreting the output

fabro66 / GAST-Net-3DPoseEstimation

A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net)

MIT License

311 stars 70 forks source link

Hi @fabro66,

I want to understand the output (given by prediction variable in gen_skes.py). It is a list of length T (number of frames) where each element being an array of shape (1x17x3). In this regard, what are these numbers? I am assuming that they are x, y, and z coordinates in metric space aligned w.r.t. pelvis joint. In this regard, is the following coordinate system (origin is in pelvis joint) axis correct?

coordinate system

On the other hand, I have seen that you do prediction[0][:, :, 2] -= np.amin(prediction[0][:, :, 2]) for "adding absolute distance to 3D poses and rebase the height". Could you please explain it in more detail?

fabro66 / GAST-Net-3DPoseEstimation

Interpreting the output #24