facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos
Other
539 stars 58 forks source link

Questions about the code. #64

Closed haoz19 closed 1 year ago

haoz19 commented 1 year ago

Hello Gengshan,

Thanks for your contribution to video 3D reconstruction by introducing such a great approach and providing detailed instructions and code! Not only Banmo but also LASR, and ViSER all inspired me a lot.

I spent days trying to understand your code, which is already very clear, but I still have something unclear.

In the LASR code, you use 2 functions: obj_to_cam() and pinhole_cam() to project the vertexes from 3D root(canonical) coordinates to 2D coordinates. The LBS operation is within obj_to_cam(), which is from root space(rest pose) to view space. pinhole_cam() is from camera 3D coordinates to 2D coordinatese. image

And we want to do the same operation in banmo/nnutils/train_utils.py. We tried to follow the pipeline you provided: X(root space at rest pose) -> forward LBS -> X(root space at t') -> root pose -> X(view space at t').

And we try to use the forward lbs(bones, rts_fw, skin, xyz_in, backward=False), obj_to_cam(in_verts, Rmat, Tmat) and then pinhole_cam(in_verts, K). 'K' could be obtained using 'self.model.rtk' and 'self.model.kaug'. 'Rmat' and 'Tmat' can be obtained by: root_rts = self.nerf_root_rts(dp_feats_rd). 'rts_fw' could be obtained by: bone_rts = self.nerf_body_rts(embedid).

But I'm stuck in the terms 'dp_feats_rd' and 'embedid', could you give us some insight about what those 2 terms stand for and how should we get those? Also is there anything I'm misunderstanding above?

Many Thanks!

Hao

gengshan-y commented 1 year ago

Hi Hao, thanks for the kind words. dp_feats_rd is the HxWx16 DensePose feature. embedid serves the same purpose as frameid, both of which are the frame index within the whole dataset.

So your understanding is mostly correct, except that you want to use root_rts = self.nerf_root_rts(frameid) to get object to view space transformations. Densepose features are only used together with the pre-trained CNN to initialize root poses, and never used during optimization.

haoz19 commented 1 year ago

Hello Gengshan, Thanks so much for the clear reply, which helps me a lot to understand the code!