Closed maudzung closed 5 years ago
From the Human3.6M dataset README:
The parametrizations we provide are 3D positions in the original coordinate space (D3_Positions) and transformed for monocular prediction using the camera parameters (D3_Positions_mono). We also provide 3D Angles for monocular prediction (D3_Angles_mono) and projections of the skeleton onto the image plane (D2_Positions). Lastly we provide 3D positions using the same limb lengths for all subjects (D3_Positions_mono_universal) as a 3D position parametrization that is invariant to subject size. The skeleton information is provided in the metadata.xml file that is delivered with our code.
Thank you. I already read the file but I don't understand. From D3_Positions (original coordinate space), they used camera parameters to transform D3_Positions to "D3_Positions_mono". How did they do? And what is meaning of "Monocular prediction". In this case, where is coordinate of "D3_Positions_mono"? Do you have more information about the annotations? Please share with us. Thank you once again!
"Monocular prediction" means prediction from a single camera image. So I believe that the key difference is that D3_Positions all share the same coordinate space regardless of camera ("world space"), whereas D3_Positions_mono are in coordinate spaces relative to the individual cameras that recorded the footage. A camera extrinsic matrix can be used to map between these spaces.
Note that I haven't verified this myself, since I don't currently use D3_Positions for any projects.
Can you please help me explain the differences between D3_positions, D3_position_mono and D3_position_mono_universal? What are their meanings? Thank you so much!