geopavlakos / hamer

HaMeR: Reconstructing Hands in 3D with Transformers
https://geopavlakos.github.io/hamer/
MIT License
326 stars 28 forks source link

Orthographic Projection #52

Closed mnauf closed 1 month ago

mnauf commented 1 month ago

Does your paper use orthographic projection to project hand mesh onto image?

"Our regressor also estimates camera parameters π. The camera π corresponds to a translation t ∈ R3 that allows us to project the 3D mesh and the 3D joints to the image. Given fixed camera intrinsics K, the projection of the 3D joints X is: x = π(X) = ΠK(X + t). Eventually, we learn the mapping f(I) = Θ, where the regressed parameters are Θ = {θ, β, π}."

Because you seem to predict the translation vector and not the focal length. If that's true (that you are using orthographic projection), what do you need fixed camera intrinsics K for?

geopavlakos commented 1 month ago

We predict the parameters of the orthographic projection (scale, 2D translation). Then, we convert those to a translation that is used to approximate this orthographic projection with a perspective projection that has a (large) fixed focal length value. You can follow the conversion here. pred_cam is the raw output of the network and we convert this to pred_cam_t which is the translation of the perspective projection.