MoyGcc / vid2avatar

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)
https://moygcc.github.io/vid2avatar/
Other
1.2k stars 102 forks source link

About smpl_init.pth in assets #27

Closed MinJunKang closed 1 year ago

MinJunKang commented 1 year ago

Thanks for sharing the code!

I found that smpl_init.pth in the asset folder is used for better convergence in the early training stage. I want to change the MLP part of the implicit_network so I cannot use this initial checkpoint directly.

I thought this was the unit sphere initialization but it seems different because of the additional input, body pose.

What does smpl_init represent, and can you let me know the training pipeline to get smpl_init?

Thanks!

MoyGcc commented 1 year ago

Hi, thanks for your interest.

The smpl_init.pth is an initialization for the human shape network. It's an SMPL generic shape initialization (a naked human body shape). And yes, if you want to change the network architecture, it's necessary to either deactivate the SMPL initialization here https://github.com/MoyGcc/vid2avatar/blob/main/code/confs/model/model_w_bg.yaml#L4 or retrain one that matches your MLP design. Note that if you deactivate the smpl_init, the performance will not be as expected (very likely to diverge). A simple trick would be to use the bounding boxes obtained from projected SMPL masks to supervise the ray opacities outside of the box (to be close to 0) and inside of the box (to be close to 1). This will help for robust training and better decoupling of human-scene in case the smpl_init is not activated.

To train the initial shape network, we just downloaded the AMASS PosePrior SMPL sequences and used direct SDF supervision for training (Eikonal loss included). I haven't cleaned that part of the code but will try to update the repo with the code SMPL initialization training.

MinJunKang commented 1 year ago

Hello, thanks for your kind answer! I am trying to change the MLP part of implicit_network to Instant_NGP for fast training and inference.

I know that your team has another fast version of human nerf, InstantAvatar which is also very cool. However, I found that InstantAvatar cannot optimize the foreground mask during training and also fully depends on the initial mask.

Is it possible to integrate Instant NGP to this vid2avatar so that we can decompose the background and foreground with fast speed? In my code, naive integration of both methods couldn't decompose foreground/background and easily diverge during training. I really want your advice.

Thanks !!

MoyGcc commented 1 year ago

Hi,

That's indeed a very great question. There is also someone from our group working on integrating Instant NGP into the current V2A framework. As you said, a naive implementation would not work perfectly (maybe because of SDF vs NeRF). I noticed one paper called Neuralangelo can have (much) better surface reconstruction with Hash-Encoding. And it also explained a bit why naively integrating NGP into SDF-based volume rendering would not work that well. But the speed of Neuralangelo is not comparable with the original NGP paper anymore.

MinJunKang commented 1 year ago

Thanks for your valuable comment! It seems nice to integrate Neuralangelo with V2a. There is no official code for this but SDFStudio has numerical gradient implementation in it! Very nice work! Thanks