Nan appears when training

facebookresearch / unbiased-teacher

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

https://arxiv.org/abs/2102.09480

MIT License

410 stars 82 forks source link

Nan appears when training #7

Closed happyxuwork closed 3 years ago

happyxuwork commented 3 years ago

i set use the default seting as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16; but from 19 steps ,The following information appears. 20210326-092159(WeLinkPC) is normal？ is any config should altert? Looking forward to your reply！

ycliu93 commented 3 years ago

It is not normal for having Nan loss in the training. Could you try to reduce the unsupervised loss weight to 2 and see if the model still cannot converge?

happyxuwork commented 3 years ago

It is not normal for having Nan loss in the training. Could you try to reduce the unsupervised loss weight to 2 and see if the model still cannot converge?

by altering the learning rate from 0.01 to 0.001, not Nan appears. so the default seting of learning rate in https://github.com/facebookresearch/unbiased-teacher/blob/05dad84c8e1bb44c6fd14706571ab0769143e48d/configs/coco_supervision/faster_rcnn_R_50_FPN_sup05_run1.yaml#L22 need to be modified to 0.001?

ycliu93 commented 3 years ago

I would suggest reducing the unsupervised loss weight to 2 first and see if nan loss won't appear. Reducing the learning rate might affect the final performance.