akanazawa / hmr

Project page for End-to-end Recovery of Human Shape and Pose
Other
1.55k stars 391 forks source link

Not working on fat men #40

Closed abeacco closed 6 years ago

abeacco commented 6 years ago

Hi,

We have been trying your code recently with different images of our own, and we have noticed that the shape parameters doesn't seem to adjust correctly. For example we have tried with different fat men pictures and even though the pose is almost correctly detected, the shape is always thin and it's not adapted to the actual shape. Is there any limitation in the demo code? Or is it due to a limitation with the SMPL model? Could it be because you are using the neutral SMPL model?

In the picture you can see what we mean.

fatman

Thank you,

Alex B.

Superlee506 commented 6 years ago

I think this paper mainly focuses on 3d human pose estimation. As to your condition, the silhouette should be considered.

abeacco commented 6 years ago

Hi,

The paper is called "End-to-end Recovery of Human Shape and Pose", and if you read it you will see they are using the SMPL model to recover the mesh. That model has 10 blendshape values to achieve any human shape, so after reading the paper and the code, I understand they should also care about fitting the mesh shape to the silhouette by using those parameters. I would not understand the goal of recovering a mesh if not with the shape, because otherwise, the paper would only be about pose estimation like you say, and therefore, a mesh wouldn't be necessary at all.

akanazawa commented 6 years ago

Hi,

While we do recover shape it's not very adequate because our training data does not contain people of large shape variations (we really only have ground truth shape on 5 subjects from human 3.6m). As @Superlee506 mentioned, we do not consider silhouettes which is necessary for getting better fit of shape. This is a limitation our model. One reason we don't use silhouettes is because using silhouette loss means you assume that the person is not occluded, that you can see the entire silhouette of the person during training. This prevents you from training on cluttered images of people as you see in COCO dataset, and the focus of this work is more on how to get an end-to-end model that can predict humans on these everyday images of people with occlusion.

Getting shape accurately is a challenging problem that the best methods today require multiple inputs of the same person.

There are other works that use silhouettes that you might want to consider, but code is not available for all of them:

As to your point about why recover a mesh if not for the shape, recovering mesh means you need to recover the 3D joint angles of the kinematic skeleton (to skin the person), as opposed to simply recovering the x,y,z 3D joint positions, which does not fully specify the joint angles. If you only recover 3D joints, you need to solve IK and do additional post-processing on top with assumptions. This work is a step towards solving the rotation angles of humans, which is what is captured when you do motion capture, and useful for tasks such as retargeting (see our recent follow up work). Our framework can learn to predict a better shape if there was more training data, but so far the only juice is from the 2D joints which does not give much information about the shape.

Best,

Angjoo

lyupei commented 6 years ago

@akanazawa But why most people bent their knees when they face the camera?(like it shows in picture above)

Is the projection of cocoplus joints correct? In most case it seems a bit higher than where ankle really is.

EarthRockerBam commented 10 months ago

Thanks for WEIGHING in! LOL!