facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos
Other
537 stars 59 forks source link

Questions about the synthetic datasets #36

Closed kts707 closed 1 year ago

kts707 commented 2 years ago

Hi Gengshan,

Thanks for the great work!

I have a couple of questions regarding the synthetic datasets (Eagle and Hands) and the other results on your website:

  1. The instructions on synthetic datasets use the ground truth camera poses in training. However, the paths to the rtk files are commented out in the config. If I directly use this config, it won't use the ground truth camera poses in the training right?

  2. I followed the same instructions for Eagle dataset preparation, but it does not save the rtk files to the locations specified in the config, should I manually change the paths?

  3. Have you tried running BANMo optimizations on Eagle and Hands without the ground truth camera poses? And if so, how's the result visually and quantitatively (in terms of Chamfer Distance and F-scores)?

  4. I noticed that you have results of more objects such as Penguins, Robot-Laikago etc. on your website. Do you know where I can get access to these datasets as well?

gengshan-y commented 2 years ago

Hi, for 1-2, the rtk_path are commented out on purpose. If no rtk_path is found in the config file, banmo will use the camera files in database/DAVIS/Cameras/Full-Resolution/$seqname/%05d.txt, which should be the path to the auto-generated cam files.

For 3, from my observation, if camera viewpoints are initialized as all identity rotations, the final camera viewpoint will only cover 90-180 over 360 degrees on a circle even after optimization. I don't have numbers but these will look very bad.

For 4, I'm trying to release all of those in the future (possibly after this cvpr deadline). Please send me an email if you need it at an early date.

kts707 commented 1 year ago

Thanks for the answer!

I used the latest main to run an optimization on the Eagle dataset without any modification in the code base. However, Eagle's head is missing in the reconstruction. Do you know how to get the same reconstruction results here? I followed the same steps in your instructions for data processing, optimization, and evaluation... Is there anything that I need to change in the code to achieve the same quality as your results?

My results are shown below:

https://user-images.githubusercontent.com/59402345/195504694-3f895d58-6fd3-4b92-b64c-28ac68d21421.mp4

# Quantitative Results
ave chamfer dis: 17.7 cm
max chamfer dis: 20.8 cm
ave f-score at d=1%: 15.0%
min f-score at d=1%: 8.2%
ave f-score at d=2%: 34.3%
min f-score at d=2%: 20.2%
ave f-score at d=5%: 68.1%
min f-score at d=5%: 51.6%
gengshan-y commented 1 year ago

Hi, the results look really strange. Both the camera pose and deformation seem to be off a lot.

Before I get resource to reproduce it, it would help if you can verify a couple of things. Are you able to get reasonable results for the cat videos? Could you post the results of drawing root pose trajectory here?

kts707 commented 1 year ago

I can get reasonable results for the cat videos. This is the result I got after running the command python scripts/visualize/render_root.py --testdir logdir/known-cam-a-eagle-e120-b256-init/ --first_idx 0 --last_idx 120. Does it look reasonable?

https://user-images.githubusercontent.com/59402345/195515133-a6869889-940a-4f16-9432-eae4e56309c3.mp4

kts707 commented 1 year ago

I reverted back to this version of the repository and ran Eagle optimization again (Basically without your latest changes on eikonal loss). The results improved but the head of the eagle is still missing and there is still a big gap between this and the result on the paper and website.

https://user-images.githubusercontent.com/59402345/195669037-8fddc91d-7138-46d5-bd4c-6d362e37f0ac.mp4

https://user-images.githubusercontent.com/59402345/195669069-0362610a-8945-4b63-bcb5-8ec859f5783d.mp4

ave chamfer dis: 10.0 cm
max chamfer dis: 15.9 cm
ave f-score at d=1%: 19.6%
min f-score at d=1%: 9.2%
ave f-score at d=2%: 51.5%
min f-score at d=2%: 28.0%
ave f-score at d=5%: 87.2%
min f-score at d=5%: 71.2%
gengshan-y commented 1 year ago

Interesting that eikonal loss made it worse for eagle.

I also noticed an error in the doc

bash scripts/render_mgpu.sh 0 $seqname logdir/known-cam-$seqname-e120-b256/params_latest.pth \
        "0" 256

which should be querying the model after all the training stages (...-ft2)

bash scripts/render_mgpu.sh 0 $seqname logdir/known-cam-$seqname-e120-b256-ft2/params_latest.pth \
        "0" 256

But assuming you've already querying the model from known-cam-$seqname-e120-b256-ft2, the only reason I can think of is about camera pose. Maybe you could try freezing the camera pose in the 1st stage by adding --freeze_root \ before this line?

kts707 commented 1 year ago

I guess freezing the camera pose improved the result and now the head is correctly reconstructed. Now the results are close to what you had. By the way, is it possible to sometimes see this kind of artifact happening on canonical shape (for example, the hole on the main body of the eagle)?

eagle00

gengshan-y commented 1 year ago

It happens in the original paper as well when surface is not well estimated. The motivation of adding eikonal loss is to reduce such artifacts. Perhaps freezing the root in the first stage + eikonal loss will further improve it.

kts707 commented 1 year ago

I see. Thanks so much for your help!

Besides the minor typo you mentioned earlier, I think the evaluation command in your doc is missing a - and ft2 as well. It should be bash scripts/eval/run_eval.sh 0 logdir/known-cam-$seqname-e120-b256-ft2/.