Loss_box_reg starts at zero and increases during training

facebookresearch / unbiased-teacher

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

https://arxiv.org/abs/2102.09480

MIT License

409 stars 84 forks source link

Loss_box_reg starts at zero and increases during training #37

Closed ogamache closed 2 years ago

ogamache commented 2 years ago

Hello, really nice work here!

I am trying to train your network but I obtain weird behavior related to the _Loss_boxreg and the _Loss_box_regpseudo. The losses start around zero and then increase instead of starting at something around 0.4 like in your article. Here is an example of the results I obtain:

Screenshot from 2021-08-07 12-39-30

The problem starts at the beginning of the BURN_IN stage and when the unsupervised learning starts.

Since I only have access to 1 GPU, I am using smaller batch size (label: 4 images, unlabel: 4 images). I tried to reduce the learning rate and also to reduce the _UNSUP_LOSSWEIGHT from 4 to 2. Those modifications didn't help.

Do you have an idea why this weird behavior happens? Thanks a lot!

ycliu93 commented 2 years ago

There are two major reasons why this trend occurs.

The regression loss is only computed for these foreground boxes, so we will get very few pseudo-box at the beginning of the mutual learning stage. As the pseudo-boxes increase, the regression loss might increases.
We actually do not apply unsupervised regression loss in the training, since we found it degraded the performance when it is applied (pseudo-boxes selected by the classification score do not necessarily have accurate box location for student's learning).

I believe it is not relevant to the amount of GPUs you use. One thing I am curious about is whether you can get better Box AP results even if the regression loss for unsupervised data increases?

Lydiagugugaga commented 2 years ago

@ycliu93 Thanks for your great work! Also I have the same problem: 1 GPU, using smaller batch size (label: 6 , unlabel: 6 ).
Finally, I get worse Box AP results. AP=10 (for coco standard, 1%sup) The Loss_box_reg and the Loss_box_reg_pseudo could not be declined, and the total loss is about 0.4 to 0.5 all the time.