facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos
Other
532 stars 59 forks source link

Training pipeline of PoseNet #29

Closed Aku02 closed 5 months ago

Aku02 commented 2 years ago

Hello there!

Thanks a lot for sharing your work!

I have a couple of questions:

1.What is the dataset you used to train the PoseNet for root pose initialization? 2.What is the occ from the optical flow model? It seems that it's loaded in dataloader but is not used anywhere in the training.

gengshan-y commented 2 years ago

Hi, the details about PoseNet training are in the paper supplement B.1.

The training data are rendered on the fly. For more details, see here the training code of PoseNet, as well as the rendering + augmentation pipeline. Unfortunately, the sheep mesh is under commercial license and we are not allowed to release it.

occ is the flow confidence computed by forward-backward check. It is renamed as cfd_at_samp and used here to weight flow reconstruction loss.

Aku02 commented 2 years ago

Thanks for your reply!

I have a couple more follow-up questions

  1. Is there any way to get access to the sheep mesh and its surface features? Which SMPL mesh did you use?
  2. Whether human.pth and quad.pth were trained posenet using the script you mentioned here?
gengshan-y commented 2 years ago
  1. The sheep/SMPL mesh/surface features are from densepose-CSE. You may get the vertex features but I'm afraid it's not easy to get their vertex locations if you are outside the organization (even I don't have access to them now). A relevant issue is opened here. The SMPL mesh is a subdivided version of the original one as noted here.

  2. Yes.

vhvkhoa commented 2 years ago

Hi @gengshan-y, after looking through the training code of PoseNet, I would like to ask that did you only use a single human mesh to train PoseNet for humans, and another sheep mesh to train PoseNet for quadruped animals ?

If we put aside that there are various types of humans and animals, that might be handled well by the pretrained CSE features. How can your PoseNet predict the root pose very well when the objects make various poses ?

gengshan-y commented 2 years ago

Yes.

The initial poses passed into banmo are noisy indeed due to deformations/shape variations. BANMo is updating the root poses during optimization. See here.

vhvkhoa commented 2 years ago

Thank you for your quick response. But I would like to clarify a little bit that my question is about the function forward_warmup that you used to pre-train PoseNet, which is stored in human.pth and and quad.pth. I was wondering if they were trained using a single mesh, e.g., the mesh of resting pose sheep (illustrated in Fig.12 of your paper), with random camera poses generated around it ?

gengshan-y commented 2 years ago

That is correct, we use rest shape of the sheep/human only to train PoseNet.

sidsunny commented 7 months ago

Hi @gengshan-y . Thanks for your amazing work and for sharing the code! I see that the sheep mesh cannot be released. Can you suggest a way to train PoseNet on our custom mesh? I suppose we cannot get CSE embeddings for our custom mesh.

gengshan-y commented 7 months ago

Hi, I think there is no off-the-shelf solution.

The most straightforward way to get vertex embedding is to follow the CSE solution, where one can label corresponding 3D keypoints on a canonical mesh, and label 2D keypoints on images, from which you can learn a vertex embedding that corresponds to pixel features.

Starting from a mesh with vertex features, you would be able to train banmo's posenet following the paper.