JunzheJosephZhu / HiFA

Apache License 2.0
181 stars 5 forks source link

About sds loss #8

Open OrangeSodahub opened 3 months ago

OrangeSodahub commented 3 months ago

Hi, after reading your paper. I just want to confirm that: using a sds method is equal to use the outputs from diffusion model (e.g. latents, rgb images) to supervise a standard nerf model? Is that correct?

HiFA-team commented 3 months ago

true. diffusion model outputs noise, so you have to basically subtract the noise from the noisy image to get the "outputs" you're talking about

On Thu, Mar 14, 2024 at 00:19 YangXiuyu @.***> wrote:

Hi, after reading your paper. I just want to confirm that: using a sds method is equal to use the outputs from diffusion model (e.g. latents, rgb images) to supervise a standard nerf model? Is that correct?

— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEHOVAA4XFTWXKXVXKTYYB4ALAVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DINBRGAYTSMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

OrangeSodahub commented 3 months ago

Yeah I know. But I got a little bit confused. In this case, how diffferent between "sds" and train a standard nerf model? The rgb images from diffusion at different times for the same prompt are different, they aren't consistent, how can they fit a nerf model and converge?

HiFA-team commented 3 months ago

its the same training process. It probably converges since it's optimizing for a KL objective, if you look into the math at DreamFusion's appendix.

On Thu, Mar 14, 2024 at 00:28 YangXiuyu @.***> wrote:

Yeah I know. But I got a little bit confused. In this case, how diffferent between "sds" and train a standard nerf model? The rgb images from diffusion at different times for the same prompt are different, they aren't consistent, how can they fit a nerf model and converge?

— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-1994887139, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEDJNQL5VOVVA36I2A3YYB5EVAVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJUHA4DOMJTHE . You are receiving this because you commented.Message ID: @.***>

OrangeSodahub commented 3 months ago

@HiFA-team Thanks. And I have another question, can I modify the loss function to just L1 Loss here? And can I remove the w term for simplification?

https://github.com/JunzheJosephZhu/HiFA/blob/d0213bcbdfcbe6181ed1960f6b0a8b72dd7909a4/nerf/sd.py#L293

HiFA-team commented 3 months ago

you can try it, removing w might cause an imbalance with other loss terms. I tried L1, it didnt really work that well

On Sun, Apr 7, 2024 at 17:58 YangXiuyu @.***> wrote:

@HiFA-team https://github.com/HiFA-team Thanks. And I have another question, can I modify the loss function to just L1 Loss here? And can I remove the w term for simplification?

https://github.com/JunzheJosephZhu/HiFA/blob/d0213bcbdfcbe6181ed1960f6b0a8b72dd7909a4/nerf/sd.py#L293

— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-2041401399, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEGVVEAS4NM654AQ3KTY4EKC3AVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQYDCMZZHE . You are receiving this because you were mentioned.Message ID: @.***>

OrangeSodahub commented 3 months ago

And the w term you used in your paper is the constant one?

HiFA-team commented 3 months ago

I think it was. The default option should be constant.

On Sun, Apr 7, 2024 at 21:45 YangXiuyu @.***> wrote:

And the w term you used in your paper is the constant one?

— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-2041476608, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEBUCGUPPF6EGSRPGILY4FEW7AVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQ3TMNRQHA . You are receiving this because you were mentioned.Message ID: @.***>

OrangeSodahub commented 3 months ago

Thanks, and I noticed that you also write about perceptual loss here:

https://github.com/JunzheJosephZhu/HiFA/blob/d0213bcbdfcbe6181ed1960f6b0a8b72dd7909a4/nerf/utils.py#L506-L532

Did you show the results of this part in your paper? How is the reuslt of "not use_sds" ?

JunzheJosephZhu commented 3 months ago

That's for the image-guided generation part, assuming there's ground truth generated by syncdremaer

OrangeSodahub commented 3 months ago

Thanks, and I found the fact showed in Fig.5 of your paper is interesting, and you said "However, relying solely on the image-space loss LSDS-Image results in color bias issues, regardless of the guidance scale used.", I wonder do you have any options about why using soly image loss is bad?

image

HiFA-team commented 3 months ago

I think there's some sort of nonlinear relationship between latent space and rgb space, and the diffusion model is trained on the latent space. I don't really have a good idea though, CFG=100 pretty much breaks all the usual theoretical intuitions one would have for a diffusion model...

On Mon, Apr 8, 2024 at 17:11 YangXiuyu @.***> wrote:

Thanks, and I found the fact the showed in Fig.5 of your paper is interesting, and you said "However, relying solely on the image-space loss LSDS-Image results in color bias issues, regardless of the guidance scale used.", I wonder do you have any options about why using soly image loss is bad?

— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-2042249636, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEEUPWFKTCWR3ANAXDTY4JNMXAVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGI2DSNRTGY . You are receiving this because you were mentioned.Message ID: @.***>

OrangeSodahub commented 3 months ago

Thanks, I'm also quite confused about this part:

https://github.com/JunzheJosephZhu/HiFA/blob/1bbe86135f960f4f99ab1f4c294bb5f4151da273/nerf/sd.py#L350-L355

Is there any reference e.g. math equations for the calculation? Why * (256 * 256)

OrangeSodahub commented 3 months ago

On the other hand, can I say: if I use L1 loss instead of L2 loss, then sds method actually not being used (and the knowledge of diffusion will not be used), here is reason:

# sds
∂loss/∂θ = w(ε - ε0) * ∂z/∂θ = ∂loss/∂z * ∂z/∂θ
# your method
∂loss/∂θ = w * αt/σt *(z - z0) * ∂z/∂θ

Validation:

# 1
loss = w * ||z - z0||^2
∂loss/∂θ = ∂(w * ||z - z0||^2)/∂θ = 2w * (z - z0) * ∂z/∂θ
# 2
loss = w * ||z - z0||
∂loss/∂θ = ∂(w * ||z - z0||)/∂θ = 2w * (±1) * ∂z/∂θ
JunzheJosephZhu commented 3 months ago

I would say not as much knowledge is used, or the gradient is quantized to {-1, 1}