eloimoliner / CQTdiff

Official repository of the paper "Solving Audio Inverse Problems with a Diffusion Model", submitted to ICASSP 23
MIT License
106 stars 11 forks source link

reconstruction guidance mismatch with the paper #3

Open XZWY opened 6 days ago

XZWY commented 6 days ago
norm=torch.linalg.norm(y-den_rec,dim=dim,` ord=2)
rec_grads=torch.autograd.grad(outputs=norm, inputs=x)

rec_grads=rec_grads[0]

normguide=torch.linalg.norm(rec_grads)/x.shape[-1]**0.5

#normalize scaling
s=self.xi/(normguide*t_i+1e-6)

#optionally apply a treshold to the gradients
if self.treshold_on_grads>0:
    #pply tresholding to the gradients. It is a dirty trick but helps avoiding bad artifacts 
    rec_grads=torch.clip(rec_grads, min=-self.treshold_on_grads, max=self.treshold_on_grads)

score=(x_hat.detach()-x)/t_i**2

Inside the sampler, the norm guide and error norm are all normalized, but they should all be the power of the error and the gradient, which seems to be mismatching the paper. Is this on purpose?

eloimoliner commented 6 days ago

Hi, I'm not sure I understand the issue. Do you mean that the norm should be squared? If so, note that after normalizing the gradients with the gradient norm, both using the squared and not squared norm gives equivalent normalized gradients.

XZWY commented 6 days ago

I see, thanks! then I assume in the paper it also should be normalizing by the gradient norm $$||G||_2$$ instead of the squared norm $$||G||_2^2$$ right? As in the image below Screenshot 2024-10-19 120721

eloimoliner commented 5 days ago

Oh yes, you are right. That looks like a typo.