loss Nan - Githubissues

Fly-dream12 commented 2 years ago

When training the model on custom dataset, the learning rate is set as 0.001 and ims_per_batch_label is set as 4, this error happens in ubteacher/modeling/proposal_generator/proposal_utils.py in find_top_rpn_proposals: FloatingPointError: Predicted boxes or scores contain Inf/Nan. Training has diverged.

is any config should be altered? Thanks for your apply! @ ycliu93

ycliu93 commented 2 years ago

There are some tips for using Unbiased Teacher on custom datasets.

The unsupervised loss weight affects the stability of semi-supervised training. The default loss weight is 4, which leads to the best performance in COCO 1% case, but I would suggest reducing this value to 1 to make sure it won't diverge. Once it is stable, you could try to increase it for a better result.
Threshold is another important hyperparameter, and you could check how many pseudo-boxes are generated in the pseudo-set. You need to make sure the number of pseudo-box is similar or fewer than the number of boxes in the ground-truth labels. Make sure it won't generate too many pseudo-box gradually.
You could also fix the teacher model first and check whether using a fixed teacher model can lead to a better student. If it helps, then you could use EMA to evolve the Teacher for a better result.

Hope these tricks help your experiment. Let me know if you have other questions. :)

Fly-dream12 commented 2 years ago

1) As you mentioned in other issues, the learning rate should not be too small. I have trained an unbiased teacher model under BASE_LR: 0.0001, however, the results are even worse than results under 1% supervision of my data. So is the learning rate wrongly set? 2) I tried under BASE_LR: 0.001 and lessen the UNSUP_LOSS_WEIGHT to 1.0, however, the training is still diverged. 3)How to visualize or check the pseudo-boxes, where can I print the numbers and fix the threshold. 4) How to fix the teacher model, just comment the EMA part? Thanks for your help!

At 2021-09-11 16:50:55, "Yen-Cheng Liu" @.***> wrote:

There are some tips for using Unbiased Teacher on custom datasets.

The unsupervised loss weight affects the stability of semi-supervised training. The default loss weight is 4, which leads to the best performance in COCO 1% case, but I would suggest reducing this value to 1 to make sure it won't diverge. Once it is stable, you could try to increase it for a better result.

Threshold is another important hyperparameter, and you could check how many pseudo-boxes are generated in the pseudo-set. You need to make sure the number of pseudo-box is similar or fewer than the number of boxes in the ground-truth labels. Make sure it won't generate too many pseudo-box gradually.

You could also fix the teacher model first and check whether using a fixed teacher model can lead to a better student. If it helps, then you could use EMA to evolve the Teacher for a better result.

Hope these tricks help your experiment. Let me know if you have other questions. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

ycliu93 commented 2 years ago

I used the learning rate = 0.01 and it works on COCO 1% case, while I am not sure whether it is the best setup in your customized dataset. As long as it is within a reasonable range, it should not lead to divergence.
I am not sure why your model cannot converge under the low unsupervised loss weight. Could you try to lower it to 0.5 or lower? Also, if possible, could you provide a brief description of your dataset? Maybe I could have some ideas after understanding your setup.
Detectron2 provided Visualizer (https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer)

You could add it to the following line, where the thresholding function is used. https://github.com/facebookresearch/unbiased-teacher/blob/6977c6f77c812fae4064dc1b3865658c2ed247b1/ubteacher/engine/trainer.py#L537 As for the number of pseudo-labels, you could also check the element of pesudo_proposals_roih_unsup_k. Each element of the list is boxes of an image. For example, you could get the number of pseudo-boxes for the first image of a batch by printing len(pesudo_proposals_roih_unsup_k[0]). You could add this to the record_dict, and it will show on the tensorboard log for better tracking.

There is a EMA_KEEP_RATE on config file. Just set it as 1.0 and the Teacher model should be fixed. On the contrary, if you set it as 0.0, you would have identical weights for the Teacher and Student. https://github.com/facebookresearch/unbiased-teacher/blob/6977c6f77c812fae4064dc1b3865658c2ed247b1/configs/coco_supervision/faster_rcnn_R_50_FPN_sup1_run1.yaml#L35

Let me know whether these help you. Thanks!

Fly-dream12 commented 2 years ago

I have use the class elf.visualize_training, however, nothing appears in the tensorboard. @ycliu93

facebookresearch / unbiased-teacher

loss Nan #46