facebookresearch / adaptive_teacher

This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".
Other
183 stars 37 forks source link

loss nan #14

Closed Pandaxia8 closed 2 years ago

Pandaxia8 commented 2 years ago

When I set Dis_loss_weight=0.1, model will collapse. I see the same problem in https://github.com/facebookresearch/detectron2/issues/1128 . According to your solution, setting the smaller diss_weight will alleviate this issue.But it will get a poor MAP. How did you train your model with Dis_loss_weight=0.1?

[05/30 11:21:11] d2.utils.events INFO: eta: 8:40:21 iter: 9999 total_loss: nan loss_cls: nan loss_box_reg: nan loss_rpn_cls: 0.4926 loss_rpn_loc: 0.2313 loss_D_img_s: nan loss_D_img_t: nan time: 0.6949 data_time: 0.0415 lr: 0.01 max_mem: 5007M [05/30 11:21:25] d2.utils.events INFO: eta: 8:39:59 iter: 10019 total_loss: nan loss_cls: nan loss_box_reg: nan loss_rpn_cls: 0.4825 loss_rpn_loc: 0.2396 loss_D_img_s: nan loss_D_img_t: nan time: 0.6948 data_time: 0.0418 lr: 0.01 max_mem: 5007M [05/30 11:21:38] d2.utils.events INFO: eta: 8:39:40 iter: 10039 total_loss: nan loss_cls: nan loss_box_reg: nan loss_rpn_cls: 0.4763 loss_rpn_loc: 0.2427 loss_D_img_s: nan loss_D_img_t: nan time: 0.6948 data_time: 0.0349 lr: 0.01 max_mem: 5007M [05/30 11:21:52] d2.utils.events INFO: eta: 8:38:53 iter: 10059 total_loss: nan loss_cls: nan loss_box_reg: nan loss_rpn_cls: 0.4791 loss_rpn_loc: 0.232 loss_D_img_s: nan loss_D_img_t: nan time: 0.6947 data_time: 0.0333 lr: 0.01 max_mem: 5007M [05/30 11:22:05] d2.utils.events INFO: eta: 8:38:33 iter: 10079 total_loss: nan loss_cls: nan loss_box_reg: nan loss_rpn_cls: 0.493 loss_rpn_loc: 0.2346 loss_D_img_s: nan loss_D_img_t: nan time: 0.6947 data_time: 0.0344 lr: 0.01 max_mem: 5007M

yujheli commented 2 years ago

I train on 8 V100 with batch size = 16, dis_loss_weight=0.1, using learning rate = 0.02, which is fined for me to run 100k iterations (including 10k burn-in initialization on source)