fabro66 / GAST-Net-3DPoseEstimation

A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net)
MIT License
311 stars 70 forks source link

Interpreting the output #24

Closed yerzhan7orazayev closed 3 years ago

yerzhan7orazayev commented 3 years ago

Hi @fabro66,

I want to understand the output (given by prediction variable in gen_skes.py). It is a list of length T (number of frames) where each element being an array of shape (1x17x3). In this regard, what are these numbers? I am assuming that they are x, y, and z coordinates in metric space aligned w.r.t. pelvis joint. In this regard, is the following coordinate system (origin is in pelvis joint) axis correct?

coordinate system

On the other hand, I have seen that you do prediction[0][:, :, 2] -= np.amin(prediction[0][:, :, 2]) for "adding absolute distance to 3D poses and rebase the height". Could you please explain it in more detail?

fabro66 commented 3 years ago

Hi~

  1. (T, N, C) : T: Number of frames, N: numbers of joints, C: the (x, y, z) coordinates. Please see our paper for more details.
  2. image

  3. You comment out this line of code and you will know what it means by comparing the results.