Open OrangeSodahub opened 3 months ago
true. diffusion model outputs noise, so you have to basically subtract the noise from the noisy image to get the "outputs" you're talking about
On Thu, Mar 14, 2024 at 00:19 YangXiuyu @.***> wrote:
Hi, after reading your paper. I just want to confirm that: using a sds method is equal to use the outputs from diffusion model (e.g. latents, rgb images) to supervise a standard nerf model? Is that correct?
— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEHOVAA4XFTWXKXVXKTYYB4ALAVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DINBRGAYTSMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yeah I know. But I got a little bit confused. In this case, how diffferent between "sds" and train a standard nerf model? The rgb images from diffusion at different times for the same prompt are different, they aren't consistent, how can they fit a nerf model and converge?
its the same training process. It probably converges since it's optimizing for a KL objective, if you look into the math at DreamFusion's appendix.
On Thu, Mar 14, 2024 at 00:28 YangXiuyu @.***> wrote:
Yeah I know. But I got a little bit confused. In this case, how diffferent between "sds" and train a standard nerf model? The rgb images from diffusion at different times for the same prompt are different, they aren't consistent, how can they fit a nerf model and converge?
— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-1994887139, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEDJNQL5VOVVA36I2A3YYB5EVAVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJUHA4DOMJTHE . You are receiving this because you commented.Message ID: @.***>
@HiFA-team Thanks. And I have another question, can I modify the loss function to just L1 Loss here? And can I remove the w
term for simplification?
you can try it, removing w might cause an imbalance with other loss terms. I tried L1, it didnt really work that well
On Sun, Apr 7, 2024 at 17:58 YangXiuyu @.***> wrote:
@HiFA-team https://github.com/HiFA-team Thanks. And I have another question, can I modify the loss function to just L1 Loss here? And can I remove the w term for simplification?
— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-2041401399, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEGVVEAS4NM654AQ3KTY4EKC3AVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQYDCMZZHE . You are receiving this because you were mentioned.Message ID: @.***>
And the w
term you used in your paper is the constant
one?
I think it was. The default option should be constant.
On Sun, Apr 7, 2024 at 21:45 YangXiuyu @.***> wrote:
And the w term you used in your paper is the constant one?
— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-2041476608, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEBUCGUPPF6EGSRPGILY4FEW7AVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQ3TMNRQHA . You are receiving this because you were mentioned.Message ID: @.***>
Thanks, and I noticed that you also write about perceptual loss here:
Did you show the results of this part in your paper? How is the reuslt of "not use_sds" ?
That's for the image-guided generation part, assuming there's ground truth generated by syncdremaer
Thanks, and I found the fact showed in Fig.5 of your paper is interesting, and you said "However, relying solely on the image-space loss LSDS-Image results in color bias issues, regardless of the guidance scale used.", I wonder do you have any options about why using soly image loss is bad?
I think there's some sort of nonlinear relationship between latent space and rgb space, and the diffusion model is trained on the latent space. I don't really have a good idea though, CFG=100 pretty much breaks all the usual theoretical intuitions one would have for a diffusion model...
On Mon, Apr 8, 2024 at 17:11 YangXiuyu @.***> wrote:
Thanks, and I found the fact the showed in Fig.5 of your paper is interesting, and you said "However, relying solely on the image-space loss LSDS-Image results in color bias issues, regardless of the guidance scale used.", I wonder do you have any options about why using soly image loss is bad?
— Reply to this email directly, view it on GitHub https://github.com/JunzheJosephZhu/HiFA/issues/8#issuecomment-2042249636, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAEVSEEUPWFKTCWR3ANAXDTY4JNMXAVCNFSM6AAAAABEUPOYC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGI2DSNRTGY . You are receiving this because you were mentioned.Message ID: @.***>
Thanks, I'm also quite confused about this part:
Is there any reference e.g. math equations for the calculation? Why * (256 * 256)
On the other hand, can I say: if I use L1 loss instead of L2 loss, then sds method actually not being used (and the knowledge of diffusion will not be used), here is reason:
# sds
∂loss/∂θ = w(ε - ε0) * ∂z/∂θ = ∂loss/∂z * ∂z/∂θ
# your method
∂loss/∂θ = w * αt/σt *(z - z0) * ∂z/∂θ
Validation:
# 1
loss = w * ||z - z0||^2
∂loss/∂θ = ∂(w * ||z - z0||^2)/∂θ = 2w * (z - z0) * ∂z/∂θ
# 2
loss = w * ||z - z0||
∂loss/∂θ = ∂(w * ||z - z0||)/∂θ = 2w * (±1) * ∂z/∂θ
I would say not as much knowledge is used, or the gradient is quantized to {-1, 1}
Hi, after reading your paper. I just want to confirm that: using a sds method is equal to use the outputs from diffusion model (e.g. latents, rgb images) to supervise a standard nerf model? Is that correct?