facebookresearch / banmo

BANMo Building Animatable 3D Neural Models from Many Casual Videos
Other
539 stars 58 forks source link

Question about result : a difference between the results in the paper and my results #51

Closed minsu1206 closed 1 year ago

minsu1206 commented 1 year ago

Thank you for nice work.

I have a question about the results in the paper. I just followed your instruction for data processing, training, rendering and evaluation, which is described at scripts/README.md

Desired results should be same as image

BUT my models shows poor performance results. I will describe my results below.

AMA

  1. training model by using T_swing and T_samba simultaneously --> evaluation :: ave 11.4 chamfer distance about T_swing :: ave 10.7 chamfer distance about T_samba
  2. download some files by cmd wget https://www.dropbox.com/sh/n9eebife5uovg2m/AAA1BsADDzCIsTSUnJyCTRp7a -O tmp.zip --> evaluation (=same as your instruction) :: ave 9.1 chamfer distance about T_swing
  3. train model by using only T_swing videos --> evaluation (same process about T_samba, too) :: ave 8.6 chamfer distance about T_swing :: ave 8.2 chamfer distance about T_samba

Synthetic (hand, eagle) :: ave 6.4 chamfer distance about eagle :: ave 5.3 chamfer distance about hands

I used 2xA100 for training, and same conda environment as yours.

As you see, My all results show poor performance. Here, My question is How can I get same results described in the paper?

Best regards.

gengshan-y commented 1 year ago

Hi, the results of this repo should be aligned with the camera-ready version, where Tab. 1 got updated for all entries.

However, I wouldn't expect the performance to drop when training setup 1., compared to setup 3. Does it visually look worse?

minsu1206 commented 1 year ago

Thanks you for your answer. I just downloaded papers from arxiv, so I didn't know camera-ready version. I will refer to this paper. Then, result of AMA setup2 makes sense.

AMA setup 1 look worse than setup 3. Specifically, at mesh visualization, details of leg and skirt are more accurate in setup 3 than setup 1.

I will attach reference images below. (meshes are visualized by MeshLab)

T_samba1-mesh-00000.obj from setup1 image

T_samba1-mesh-00000.obj from setup3 image

T_swing1-mesh-00000.obj from setup1 image

T_swing1-mesh-00000.obj from setup3 image

gengshan-y commented 1 year ago

Hi, the extracted surface looks suspicious in both setups. Can you confirm the latest commit is used? If so, the recent eikonal loss update may be the culprit. The original paper did not use eikonal loss. It seems eikonal loss moved the 0 isosurface inside the actual one.

There are two ways to fix it:

(1) Without re-train the model. Remove --mc_threshold 0 in scripts/render_mgpu.sh, and run that script again. This will use the default marching cubes threshold --mc_threshold=-0.002.

(2) Re-train the model without eikonal loss. To do so, you can roll back to the commit before this one.

minsu1206 commented 1 year ago

Hi I'm sorry for late reply

(1) makes better visualization results, which are plausible and simlar to paper's figure 4. After re-rendering without --mc-threhsold, I also ran evaluation script again and got better ave chamfer distance (ave 8.2), which is even lower than 9.2 (indicated in Paper Table1.)

I confirmed that I just cloned your latest commit version and trained models without eikonal loss because default eikonal loss weight is 0.

Then, I have some questions.

  1. Is there any additive loss function or techique which is not used at paper version's implementation? I wonder why I got better score than camera-ready version paper. Is there an possibility that some randomness like ray sampling or data fetching affect result?

  2. I want to know exact --mc-threshold when you rendered results for Table 1. It seems that chamfer distance is quite dependent on this hyperparameter.

I would like to hear your opinion concerning those. Best regards.

gengshan-y commented 1 year ago

The experiments in the paper are before this commit, so I think both fixing the bugs in the lbs implementation and replacing feature matching loss with feature rendering loss might have helped.

The --mc_threshold is the default value -0.002 for experiments in the paper.

minsu1206 commented 1 year ago

My questions have been resolved. Thank you for answers and sharing nice work.