About training loss - Githubissues

ShoufaChen / DiffusionDet

[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)

Other

2.07k stars 159 forks source link

About training loss #12

Open huilicici opened 1 year ago

huilicici commented 1 year ago

In DDIM or DDPM, there are losses (KL-Divergence) to constrain the diffused outputs during training steps to be Gaussian distributions. I thought it is the base for DDIM sampling (the reverse process). However, in DiffusionDet, only set prediction loss is used. So how can DDIM work without training constrain?

ShoufaChen commented 1 year ago

Hi,

The set prediction loss contains one item $\mathcal{L}_{L1}$ defined here, which measures the mean absolute error (L1 distance) between each element in the ground truth boxes and predicted boxes.

huilicici commented 1 year ago

As presented in Algorithm 1 Training of DDPM(https://arxiv.org/pdf/2006.11239v2.pdf), in step 5, a gradient descent step is adopted to constrain the diffused output to be Gaussian. Does DiffusionDet need such kind of loss to contrain the corrupt bboxes to be Gaussian?

gugite commented 1 year ago

Hi @ShoufaChen @huilicici , firstly, thanks to the authors for their good work.

Actually, I have the same confusion. DDIM conducts MSE loss between Gaussian noise and the output of the denoiser (U-Net) during the training stage. However, in DiffusionDet, it seems that the denoiser (cascade decoder) is directly optimized to refine the noisy box to obtain ground truth boxes, which works very differently from the conventional DDIM.

I am not sure whether it can be seen as introducing a denoising task like DN-Detr. Based on this understanding, the sampling steps in the inference stage also should not have an observable influence on the detection performance.

xuliwalker commented 1 year ago

Also have the same confusion. Waiting for the answer

kaustubholpadkar commented 1 year ago

I have same question as @gugite Can you please respond to this? @ShoufaChen It will be very helpful.