ashawkey / stable-dreamfusion

Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
Apache License 2.0
7.99k stars 710 forks source link

How to optimize geometry with DMTET? #315

Open buaacyw opened 1 year ago

buaacyw commented 1 year ago

Hi! Thanks a lot for this repo! I find that finetune with DMTET works well for color but only brings tiny changes to the geometry. I hope to also refine the geometry largely with DMTET. There are several scenes that this might be useful:

  1. Geometry of the origin nerf may not be good enough. We hope to also optimize the geometry.
  2. When we try to edit the nerf with DMTET and sds. For example, with a rose nerf, we try to refine it into other flowers.

I tried to enlarge the lr of DMTET parameters (vertex sdf and deformation). With all other settings and lr unchanged, I get a better result by using a 10 times larger lr for DMTET parameters. The first one is Hamburg generated with the default DMTET setting in Readme guide. And the second one is trained with ten times DMTET lr. However, the model collapses with 100 times lr. b5e922ad29138db118896c3b4f6e87b 84b576ed42c6494b9c039308acfb87b

I also tried to finetune a hamburger to pineapple by changing the prompt from "a hamburger" to "a pineapple" when training with DMTET. With the first image as init, I find the geometry of the result (second image) still doesn't change with the changed prompt. 9cf9b950617e40171bddbf81761bc2f 492f4ee5b1a8742cc068bd003da0505

I'm curious about why is this happing. Shouldn't DMTET make geometry changes easier since it provides a more efficient 3D representation? I have checked the gradient for vertex sdf and deformation. I find lots of nan in the grad. But this repo didn't use transform these nans to zeros. Will these nans cause some problems?

zz7379 commented 1 year ago

Could you please share the cmd line to generate the result above? Besides, i find that, at least in my experiment, NaN is mainly caused by grid encoder. DMTet actually can't model large shape compared to NeRF

buaacyw commented 1 year ago

The cmd: Fig 1: python main.py --text "a hamburger" --workspace trial -O python main.py -O --text "a hamburger" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial/checkpoints/df.pth # load the ckp of the first cmd Fig 2: You need to add a scaler to the lr of the sdf and deform value of DMTET. And than the cmd is the same with Fig 1 except the lr scaler Fig 3: the mid result of python main.py --text "a hamburger" --workspace trial -O Only train about 500 steps Fig 4: load the ckp of Fig3 and than change the prompt to pineapple: python main.py -O --text "a pineapple" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial/checkpoints/df.pth

buaacyw commented 1 year ago

Could you please share the cmd line to generate the result above? Besides, i find that, at least in my experiment, NaN is mainly caused by grid encoder. DMTet actually can't model large shape compared to NeRF

Hi! How did you find the Nan is caused by grid encoder? There is also some Nan in sds loss. It seems that what we can do is to just ignore the Nan? I agree that DMtet can't learn large shape.