Open kimk-ki opened 11 months ago
You should take a look at the asset folder, I have the same question as you before until I found that.
당신은 자산 폴더를 살펴봐야 합니다. 나는 그것을 찾을 때까지 당신과 같은 질문을 가지고 있습니다.
class SpecifyGradient(torch.autograd.Function):
@staticmethod
@custom_fwd
def forward(ctx, input_tensor, gt_grad):
ctx.save_for_backward(gt_grad)
# we return a dummy value 1, which will be scaled by amp's scaler so we get the scale in backward.
return torch.ones([1], device=input_tensor.device, dtype=input_tensor.dtype)
@staticmethod
@custom_bwd
def backward(ctx, grad_scale):
gt_grad, = ctx.saved_tensors
gt_grad = gt_grad * grad_scale
return gt_grad, None
grad = grad_scale * w[:, None, None, None] * (noise_pred - noise)
grad = torch.nan_to_num(grad)
loss = SpecifyGradient.apply(latents, grad)
I checked the above code. In the code above, Specify Gradient.apply(latents, grad) doesn't seem to mean grad(x,params), so I wonder where grad(x, params) is implemented.
If I misunderstood and this part of Specify Gradient.apply(latents, grad) means grad(x,params), please explain.
당신은 자산 폴더를 살펴봐야 합니다. 나는 그것을 찾을 때까지 당신과 같은 질문을 가지고 있습니다.
class SpecifyGradient(torch.autograd.Function): @staticmethod @custom_fwd def forward(ctx, input_tensor, gt_grad): ctx.save_for_backward(gt_grad) # we return a dummy value 1, which will be scaled by amp's scaler so we get the scale in backward. return torch.ones([1], device=input_tensor.device, dtype=input_tensor.dtype) @staticmethod @custom_bwd def backward(ctx, grad_scale): gt_grad, = ctx.saved_tensors gt_grad = gt_grad * grad_scale return gt_grad, None grad = grad_scale * w[:, None, None, None] * (noise_pred - noise) grad = torch.nan_to_num(grad) loss = SpecifyGradient.apply(latents, grad)
I checked the above code. In the code above, Specify Gradient.apply(latents, grad) doesn't seem to mean grad(x,params), so I wonder where grad(x, params) is implemented.
If I misunderstood and this part of Specify Gradient.apply(latents, grad) means grad(x,params), please explain.
Maybe you are right. I am confused about this for a long time.When I find the explaination in asset filefolder, I found myself stupid and frustrated so I didn't think as detaily as you, Now, I guest I'am exactly a stupid scholar!!😫
I think the term grad(x, params) is calculated in the backward process. You can see pseudocode in the appendix as follows: or refer to this discussion
params = generator.init()
opt_state = optimizer.init(params)
diffusion_model = diffusion.load_model()
for nstep in iterations:
t = random.uniform(0., 1.)
alpha_t, sigma_t = diffusion_model.get_coeffs(t)
eps = random.normal(img_shape)
x = generator(params, <other arguments>...) # Get an image observation.
z_t = alpha_t * x + sigma_t * eps # Diffuse observation.
epshat_t = diffusion_model.epshat(z_t, y, t) # Score function evaluation.
g = grad(weight(t) * dot(stopgradient[epshat_t - eps], x), params)
params, opt_state = optimizer.update(g, opt_state) # Update params with optimizer.
return params
But I don't know why use latents in *loss_sds = (latents grad.detach()).sum().**, refer to this link.
Hello, thanks for your great work!
I have one question about SDS loss implementation.
According to the pseudocode presented in Figure 8 of the original paper, the SDS loss involves a dot product of g = matmul(weight(t) * (epshat t - eps), grad(x, params)). I want to know where grad(x,params) was implemented on the code that was implemented.
Please let me know if I misunderstood something :)
Thanks.