geopavlakos / hamer

HaMeR: Reconstructing Hands in 3D with Transformers
https://geopavlakos.github.io/hamer/
MIT License
326 stars 28 forks source link

Does Hamer internally use the PCA-based pose parameters? If not, how could I get them? #54

Closed cga-cmu closed 1 month ago

cga-cmu commented 1 month ago

I am interested in getting the closest PCA-based pose parameters for the MANO model for the Hamer hand poses. I see that Hamer returns 15 rotation matrices for pose, one for each joint (each digit has 3 joints, the most proximal joint having 2 DOF).

Does Hamer generate the 6 or so PCA coefficients which are the latent/embedded pose space for the MANO model, or does it only work with rotations of the 15 joints represented either with axis/angle or rotation matrix representations? I see the reference: 6D representation proposed by Zhou et al. [24].

If Hamer does not generate the PCA coefficients, I would have to estimate them by optimizing a set of PCA coefficients to match the 2D and/or 3D keypoints generated by Hamer. A side benefit of this would be to enable the use of multiple images (multi-view), with one set of PCA coefficients generating fits of the 2D and/or 3D keypoints in N images. Obviously, the global translation and rotation of the camera would have to be set appropriately for each image/view.

Thanks!

geopavlakos commented 1 month ago

We don't use the PCA coefficients of MANO, but the conversions should be relatively easy to do. Currently the output is in the form of 3D rotation matrices. First you need to convert this to the angle-axis representation (and subtract the hands mean pose - hands_mean as defined here). Then, the way that the PCA coefficients work is demonstrated here. In short, you can do:

hand_pose_aa = np.dot(hand_pose_pca, hand_components)
hand_pose_pca = np.dot(hand_pose_aa, hand_components.T)

(both the hand_components and the hands_mean, I mentioned earlier, come from the MANO model)

We don't do these conversions in our codebase, but it shouldn't be hard to jump between different representations.