Closed happyxuwork closed 3 years ago
It is not normal for having Nan loss in the training. Could you try to reduce the unsupervised loss weight to 2 and see if the model still cannot converge?
It is not normal for having Nan loss in the training. Could you try to reduce the unsupervised loss weight to 2 and see if the model still cannot converge?
by altering the learning rate from 0.01 to 0.001, not Nan appears. so the default seting of learning rate in https://github.com/facebookresearch/unbiased-teacher/blob/05dad84c8e1bb44c6fd14706571ab0769143e48d/configs/coco_supervision/faster_rcnn_R_50_FPN_sup05_run1.yaml#L22 need to be modified to 0.001?
I would suggest reducing the unsupervised loss weight to 2 first and see if nan loss won't appear. Reducing the learning rate might affect the final performance.
i set use the default seting as the github, run python train_net.py --num-gpus 8 --config configs/coco_supervision/faster_rcnn_R_50_FPN_sup10_run1.yaml SOLVER.IMG_PER_BATCH_LABEL 16 SOLVER.IMG_PER_BATCH_UNLABEL 16; but from 19 steps ,The following information appears.
is normal? is any config should altert? Looking forward to your reply!