anibali / h36m-fetch

Human 3.6M 3D human pose dataset fetcher
Apache License 2.0
367 stars 48 forks source link

Asking the meaning of annotations #13

Closed maudzung closed 5 years ago

maudzung commented 5 years ago

Can you please help me explain the differences between D3_positions, D3_position_mono and D3_position_mono_universal? What are their meanings? Thank you so much!

anibali commented 5 years ago

From the Human3.6M dataset README:

The parametrizations we provide are 3D positions in the original coordinate space (D3_Positions) and transformed for monocular prediction using the camera parameters (D3_Positions_mono). We also provide 3D Angles for monocular prediction (D3_Angles_mono) and projections of the skeleton onto the image plane (D2_Positions). Lastly we provide 3D positions using the same limb lengths for all subjects (D3_Positions_mono_universal) as a 3D position parametrization that is invariant to subject size. The skeleton information is provided in the metadata.xml file that is delivered with our code.

maudzung commented 5 years ago

Thank you. I already read the file but I don't understand. From D3_Positions (original coordinate space), they used camera parameters to transform D3_Positions to "D3_Positions_mono". How did they do? And what is meaning of "Monocular prediction". In this case, where is coordinate of "D3_Positions_mono"? Do you have more information about the annotations? Please share with us. Thank you once again!

anibali commented 5 years ago

"Monocular prediction" means prediction from a single camera image. So I believe that the key difference is that D3_Positions all share the same coordinate space regardless of camera ("world space"), whereas D3_Positions_mono are in coordinate spaces relative to the individual cameras that recorded the footage. A camera extrinsic matrix can be used to map between these spaces.

Note that I haven't verified this myself, since I don't currently use D3_Positions for any projects.