SDS loss Questions - Githubissues

kimk-ki commented 11 months ago

Hello, thanks for your great work!

I have one question about SDS loss implementation.

According to the pseudocode presented in Figure 8 of the original paper, the SDS loss involves a dot product of g = matmul(weight(t) * (epshat t - eps), grad(x, params)). I want to know where grad(x,params) was implemented on the code that was implemented.

Please let me know if I misunderstood something :)

Thanks.

VLadImirluren commented 11 months ago

You should take a look at the asset folder, I have the same question as you before until I found that.

kimk-ki commented 11 months ago

당신은 자산 폴더를 살펴봐야 합니다. 나는 그것을 찾을 때까지 당신과 같은 질문을 가지고 있습니다.

class SpecifyGradient(torch.autograd.Function):
    @staticmethod
    @custom_fwd
    def forward(ctx, input_tensor, gt_grad):
        ctx.save_for_backward(gt_grad)
        # we return a dummy value 1, which will be scaled by amp's scaler so we get the scale in backward.
        return torch.ones([1], device=input_tensor.device, dtype=input_tensor.dtype)
    @staticmethod
    @custom_bwd
    def backward(ctx, grad_scale):
        gt_grad, = ctx.saved_tensors
        gt_grad = gt_grad * grad_scale
        return gt_grad, None

grad = grad_scale * w[:, None, None, None] * (noise_pred - noise)
grad = torch.nan_to_num(grad)
loss = SpecifyGradient.apply(latents, grad)

I checked the above code. In the code above, Specify Gradient.apply(latents, grad) doesn't seem to mean grad(x,params), so I wonder where grad(x, params) is implemented.

If I misunderstood and this part of Specify Gradient.apply(latents, grad) means grad(x,params), please explain.

VLadImirluren commented 11 months ago

당신은 자산 폴더를 살펴봐야 합니다. 나는 그것을 찾을 때까지 당신과 같은 질문을 가지고 있습니다.
class SpecifyGradient(torch.autograd.Function):
    @staticmethod
    @custom_fwd
    def forward(ctx, input_tensor, gt_grad):
        ctx.save_for_backward(gt_grad)
        # we return a dummy value 1, which will be scaled by amp's scaler so we get the scale in backward.
        return torch.ones([1], device=input_tensor.device, dtype=input_tensor.dtype)
    @staticmethod
    @custom_bwd
    def backward(ctx, grad_scale):
        gt_grad, = ctx.saved_tensors
        gt_grad = gt_grad * grad_scale
        return gt_grad, None

grad = grad_scale * w[:, None, None, None] * (noise_pred - noise)
grad = torch.nan_to_num(grad)
loss = SpecifyGradient.apply(latents, grad)
I checked the above code. In the code above, Specify Gradient.apply(latents, grad) doesn't seem to mean grad(x,params), so I wonder where grad(x, params) is implemented.

If I misunderstood and this part of Specify Gradient.apply(latents, grad) means grad(x,params), please explain.

Maybe you are right. I am confused about this for a long time.When I find the explaination in asset filefolder, I found myself stupid and frustrated so I didn't think as detaily as you, Now, I guest I'am exactly a stupid scholar!!😫

onevfall commented 5 months ago

I think the term grad(x, params) is calculated in the backward process. You can see pseudocode in the appendix as follows: or refer to this discussion

params = generator.init() 
opt_state = optimizer.init(params) 
diffusion_model = diffusion.load_model() 
for nstep in iterations: 
  t = random.uniform(0., 1.) 
  alpha_t, sigma_t = diffusion_model.get_coeffs(t) 
  eps = random.normal(img_shape) 
  x = generator(params, <other arguments>...) # Get an image observation. 
  z_t = alpha_t * x + sigma_t * eps # Diffuse observation. 
  epshat_t = diffusion_model.epshat(z_t, y, t) # Score function evaluation. 
  g = grad(weight(t) * dot(stopgradient[epshat_t - eps], x), params) 
  params, opt_state = optimizer.update(g, opt_state) # Update params with optimizer. 
return params

But I don't know why use latents in *loss_sds = (latents grad.detach()).sum().**, refer to this link.

ashawkey / stable-dreamfusion

SDS loss Questions #328