Concerns about the Rapid Decrease of Loss in Denoising Part during Training

When incorporating your denoising methods into other models, I have observed that during training, the loss for the noisy part of the query rapidly decreases to around 0.0x within a few hundred iterations. Initially, the loss for the noisy part of the query is typically around 1.x. It appears that if the loss for the denoising part of the query decreases rapidly, it may become almost negligible in the later stages of training. In fact, it seems that within just one epoch or even half an epoch, the loss for the denoising part has already reached close to zero.

I believe that if the loss of the noise reduction part drops to almost zero soon, then the effect of the noise reduction part will be diminished a lot. Therefore, I kindly request access to your training log file. I would like to compare the difference between the tgt_loss part and the loss part of the noise reduction in your model. It seems that the difference between these two parts is not substantial in your model. I would greatly appreciate your assistance in providing this information. Thank you in advance for your support.

IDEA-Research / DN-DETR

Concerns about the Rapid Decrease of Loss in Denoising Part during Training #61