Closed haoz19 closed 1 year ago
Hi Hao, thanks for the kind words. dp_feats_rd
is the HxWx16 DensePose feature. embedid
serves the same purpose as frameid
, both of which are the frame index within the whole dataset.
So your understanding is mostly correct, except that you want to use root_rts = self.nerf_root_rts(frameid)
to get object to view space transformations. Densepose features are only used together with the pre-trained CNN to initialize root poses, and never used during optimization.
Hello Gengshan, Thanks so much for the clear reply, which helps me a lot to understand the code!
Hello Gengshan,
Thanks for your contribution to video 3D reconstruction by introducing such a great approach and providing detailed instructions and code! Not only Banmo but also LASR, and ViSER all inspired me a lot.
I spent days trying to understand your code, which is already very clear, but I still have something unclear.
In the LASR code, you use 2 functions: obj_to_cam() and pinhole_cam() to project the vertexes from 3D root(canonical) coordinates to 2D coordinates. The LBS operation is within obj_to_cam(), which is from root space(rest pose) to view space. pinhole_cam() is from camera 3D coordinates to 2D coordinatese.
And we want to do the same operation in banmo/nnutils/train_utils.py. We tried to follow the pipeline you provided: X(root space at rest pose) -> forward LBS -> X(root space at t') -> root pose -> X(view space at t').
And we try to use the forward lbs(bones, rts_fw, skin, xyz_in, backward=False), obj_to_cam(in_verts, Rmat, Tmat) and then pinhole_cam(in_verts, K). 'K' could be obtained using 'self.model.rtk' and 'self.model.kaug'. 'Rmat' and 'Tmat' can be obtained by: root_rts = self.nerf_root_rts(dp_feats_rd). 'rts_fw' could be obtained by: bone_rts = self.nerf_body_rts(embedid).
But I'm stuck in the terms 'dp_feats_rd' and 'embedid', could you give us some insight about what those 2 terms stand for and how should we get those? Also is there anything I'm misunderstanding above?
Many Thanks!
Hao