ashawkey / stable-dreamfusion

Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
Apache License 2.0
8k stars 710 forks source link

Implementation of weight w(t) #106

Open phymhan opened 1 year ago

phymhan commented 1 year ago

Thanks for the awesome implementation! I have a question about the weight $w(t)$.

In the code, $w(t)$ is implemented as $(1-\bar{\alpha}_t)$ (see

https://github.com/ashawkey/stable-dreamfusion/blob/53d4cef2055234e242d0b9b247c6f41bee7bbf46/nerf/sd.py#L111

). However, according to the actual StableDiffusion implementation, shouldn't it be $\sqrt{\bar{\alpha}_t}$ since $\partial z_t/\partial z = \sqrt{\bar{\alpha}_t}I$?

I tried both in my own experiments and found that your original implementation $(1-\bar{\alpha}_t)$ worked better. Do you have any explanation for this?

Thanks!

ashawkey commented 1 year ago

@phymhan Hi, in fact the paper mentions weights in two place, you may check https://github.com/ashawkey/stable-dreamfusion/pull/9 and https://github.com/ashawkey/stable-dreamfusion/issues/29 for some old discussions.

phymhan commented 1 year ago

Hi @ashawkey, thanks for your swift reply!

FYI, I found Latent-Nerf implemented $w(t)$ as $\sqrt{\bar{\alpha}_t}(1-\bar{\alpha}_t)$,

https://github.com/eladrich/latent-nerf/blob/f49ecefcd48972e69a28e3116fe95edf0fac4dc8/src/stable_diffusion.py#L144

Now it makes sense. Thank you!

In my case where I use SDS to finetune a StyleGAN2 model, I tried different variants of $w(t)$ but found $(1-\bar{\alpha}_t)$ worked best : )

thuanz123 commented 1 year ago

Hi @phymhan, is it possible to share your code about SDS for StyleGAN2? I'm really interested in it

phymhan commented 1 year ago

Hi @thuanz123 thanks for your interest! The code can be found here: https://github.com/KunpengSong/styleganfusion