Questions on Training - Githubissues

Thanks @liuxing007

A proper way of getting it is by using the camera intrinsic parameters to project 3D camera coordinates to 2D pixel coordinates, as done here.
- For this project though, we followed what MotionBERT did and simply took the (X, Y) from the (X, Y, Z) as the 2D ground truth (You can see it here)
- The reason why that should be fine is that MotionAGFormer takes the normalized 2D pose sequence as input and it is in range [-1, 1]. The 3D pose sequence is already normalized (and rescaled to be the same scale as the 2D pose as I explained here) and as a result doing it is equivalent to convert it to 2D pose sequence in an standard way and then normalizing it.
For params I used the function implemented here. For MACs/frame you can use a library such as torchprofile that computes the MACs for the whole model. Then simply you have to divide the number for all the output frames to have MACs/frame.

TaatiTeam / MotionAGFormer