Some confusion about VSD loss implementation

zzzzzuber commented 4 months ago

Hi, thanks for your wonderful work~ I'm a little confused about the implemention of vsd loss, I followed your paper and read ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation I thought vsd loss is pixel-wise grad by net to input LQ, hence it's pixel-wise calcalation between pretrained_regularizer's output and finetuned regularizer's output, however lpips and mse loss is a scalar, i'm really confused about the implementation of vsd loss and how to apply with data loss? Hope for your reply~ ps: the pic is ProlificDreamer's implementation 6f55b7a1-8c61-4854-b97c-56a44f1524af

xhuang0904 commented 4 months ago

I am also confused, it will be great if the author can release the training code

xhuang0904 commented 4 months ago

Have you reproduce the VSD loss now?

theEricMa commented 4 months ago

Thanks for your interest in our work. Although VSD produces a two-dimensional gradient, you still need to convert this gradient into a scalar for back-propagation. That's what the SpecifyGradient function does. This conversion makes the VSD loss compatible with LIPIS and MSE.

zzzzzuber commented 4 months ago

Thanks for your interest in our work. Although VSD produces a two-dimensional gradient, you still need to convert this gradient into a scalar for back-propagation. That's what the SpecifyGradient function does. This conversion makes the VSD loss compatible with LIPIS and MSE.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

zzzzzuber commented 4 months ago

Have you reproduce the VSD loss now?

I will try it again, haha~

theEricMa commented 3 months ago

That's a great question. As discussed in HiFa, SDS loss is a weighted sum of the MSE loss between the generated images and their denoised versions by the diffusion model. For VSD, you can find out that it is a weighted sum of the MSE between the denoised images from the pre-trained diffusion model and those from the fine-tuned model.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

zzzzzuber commented 3 months ago

That's a great question. As discussed in HiFa, SDS loss is a weighted sum of the MSE loss between the generated images and their denoised versions by the diffusion model. For VSD, you can find out that it is a weighted sum of the MSE between the denoised images from the pre-trained diffusion model and those from the fine-tuned model.

Thanks for your reply~ Now i know how to reproduce vsd loss in my model~ Btw, out of curiousity, if i can regard vsd loss grad as a derivative of weighted mse loss? Can i replace vsd loss with a weighted mse loss? Thanks for your kindness help again!

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂， and use mse loss directly is an easier way?

theEricMa commented 3 months ago

Following the conventional method to compute the gradient requires taking the derivative with respect to the SD's U-net, which significantly increases GPU memory usage. This trick was proposed by DreamFusion for computing the SDS loss and has been adopted by all subsequent works.

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂， and use mse loss directly is an easier way?

xhuang0904 commented 3 months ago

Hi， can you explain a bit about the VSD loss?

1st, the grad term of VSD loss in the ProlificDreamer is like:

grad = w*(noise_pred-noise_pred_q )

From my understanding, in the OSEDiff case, it is

grad = w*(noise_pred_pretained_regularizer- noise_pred_finetune_regularizer)

is it right?

2nd, did you just follow the w(t) in the ProlificDreamer

w = (1 - self.alphas[t])

thanks a lot!

zzzzzuber commented 3 months ago

Following the conventional method to compute the gradient requires taking the derivative with respect to the SD's U-net, which significantly increases GPU memory usage. This trick was proposed by DreamFusion for computing the SDS loss and has been adopted by all subsequent works.

I see~ But if VSD loss can be seen as a weighted sum of the mse loss between the denoised images from pretrained models and fine-tuned models, why not use mse loss directly? Because develop customized gradient backpropagation is not simple (just for me)😂， and use mse loss directly is an easier way?

ok, I see~~ Many thanks for your kind help~~~

zzzzzuber commented 3 months ago

Hi， can you explain a bit about the VSD loss?

1st, the grad term of VSD loss in the ProlificDreamer is like:

grad = w*(noise_pred-noise_pred_q )

From my understanding, in the OSEDiff case, it is

grad = w*(noise_pred_pretained_regularizer- noise_pred_finetune_regularizer)

is it right?

2nd, did you just follow the w(t) in the ProlificDreamer

w = (1 - self.alphas[t])

thanks a lot!

I use the same way to implement vsd loss~

Yangkai-Wei commented 2 months ago

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

beyondbatman-master commented 2 months ago

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

we meet the same problem. Did you solve the problem?

zzzzzero commented 3 weeks ago

I use the official training code，also meet the same problem，Did you solve the problem?

@zzzzzuber Have you implemented this training process? I implemented VSD loss in the same way as ProlificDreamer, but after thousands of steps, the pseudo loss always ends up in the tens of thousands, and then the image becomes NaN

we meet the same problem. Did you solve the problem?

cswry / OSEDiff

Some confusion about VSD loss implementation #21