Thanks for your great work! I have been reading your published paper manuscript as well as the code implementation, and I came into a problem about the used loss function. It would be highly appreciated if you could explain how this works.
Here is how it goes. In the paper manuscript, specifically in Eq. (2), the overall training objective of AnyDoor is an MSE loss between the U-net output and the ground-truth image latents, corresponding to the following figure:
In the code implementation, the loss type is controlled by self.parameterization, where self.parameterization is set to "eps" by default. It is also not changed in the configuration file (configs/anydoor.yaml).
Therefore, in the p_losses() function of ldm/models/diffusion/ddpm.py (line 367 to line 411), we can see:
def get_loss(self, pred, target, mean=True):
if self.loss_type == 'l1':
loss = (target - pred).abs()
if mean:
loss = loss.mean()
elif self.loss_type == 'l2':
if mean:
loss = torch.nn.functional.mse_loss(target, pred)
else:
loss = torch.nn.functional.mse_loss(target, pred, reduction='none')
else:
raise NotImplementedError("unknown loss type '{loss_type}'")
return loss
def p_losses(self, x_start, t, noise=None):
noise = default(noise, lambda: torch.randn_like(x_start))
x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
model_out = self.model(x_noisy, t)
loss_dict = {}
if self.parameterization == "eps":
target = noise
elif self.parameterization == "x0":
target = x_start
elif self.parameterization == "v":
target = self.get_v(x_start, noise, t)
else:
raise NotImplementedError(f"Parameterization {self.parameterization} not yet supported")
loss = self.get_loss(model_out, target, mean=False).mean(dim=[1, 2, 3])
log_prefix = 'train' if self.training else 'val'
loss_dict.update({f'{log_prefix}/loss_simple': loss.mean()})
loss_simple = loss.mean() * self.l_simple_weight
loss_vlb = (self.lvlb_weights[t] * loss).mean()
loss_dict.update({f'{log_prefix}/loss_vlb': loss_vlb})
loss = loss_simple + self.original_elbo_weight * loss_vlb
loss_dict.update({f'{log_prefix}/loss': loss})
return loss, loss_dict
if self.parameterization == "eps", target will become random Gaussian noise, where the loss function will be MSE loss between the U-net output and random Gaussian noise. This is confict with the one shown in the paper manuscript.
According to Eq. (2) in the paper manuscript, I suppose that self.parameterization should be set to "x0", resulting in that target will become x_start, so that the code implementation could align with the formula. Am I understanding this correct? Please enlighten me if I have get anything wrong. Looking forward to your reply.
Hi @XavierCHEN34 ,
Thanks for your great work! I have been reading your published paper manuscript as well as the code implementation, and I came into a problem about the used loss function. It would be highly appreciated if you could explain how this works.
Here is how it goes. In the paper manuscript, specifically in Eq. (2), the overall training objective of AnyDoor is an MSE loss between the U-net output and the ground-truth image latents, corresponding to the following figure:
In the code implementation, the loss type is controlled by
self.parameterization
, whereself.parameterization
is set to"eps"
by default. It is also not changed in the configuration file (configs/anydoor.yaml
).Therefore, in the
p_losses()
function ofldm/models/diffusion/ddpm.py
(line 367
toline 411
), we can see:if
self.parameterization == "eps"
,target
will become random Gaussian noise, where the loss function will be MSE loss between the U-net output and random Gaussian noise. This is confict with the one shown in the paper manuscript.According to Eq. (2) in the paper manuscript, I suppose that
self.parameterization
should be set to"x0"
, resulting in thattarget
will becomex_start
, so that the code implementation could align with the formula. Am I understanding this correct? Please enlighten me if I have get anything wrong. Looking forward to your reply.Best regards