facebookresearch / adaptive_teacher

This repo provides the source code for "Cross-Domain Adaptive Teacher for Object Detection".
Other
180 stars 35 forks source link

Loss becomes NaN #25

Open miurakenichi opened 2 years ago

miurakenichi commented 2 years ago

I conducted a reproduction experiment of domain adaptation from PASCAL VOC to Clipart 1k, but the loss became NaN during learning.

I am using 4 RTX 2080 Ti and changing the parameters as follows.

IMG_PER_BATCH_LABEL: 16 -> 4 IMG_PER_BATCH_UNLABEL: 16 -> 4 BASE_LR: 0.04 -> 0.01 MAX_ITER: 100000 -> 400000 BURN_UP_STEP: 20000 -> 80000

Is there a good solution?

yujheli commented 2 years ago

Does the loss become NaN during the burn-up or during mutual learning? Since you were using 80000 as the number of burn up iteration.

miurakenichi commented 2 years ago

Thank you for your response. It became NaN during the mutual learning. The loss was NaN in the log output when the iteration was 188599.

[06/23 16:58:09] d2.utils.events INFO:  eta: 1 day, 18:11:30  iter: 188579  total_loss: 2.404  loss_cls: 0.07097  loss_box_reg: 0.1129  loss_rpn_cls: 0.03918  loss_rpn_loc: 0.1\
054  loss_D_img_s: 0.6432  loss_cls_pseudo: 0.08861  loss_box_reg_pseudo: 0.2588  loss_rpn_cls_pseudo: 0.03184  loss_rpn_loc_pseudo: 0.171  loss_D_img_s_pseudo: 0.0008817  loss\
_D_img_t: 0.5658  time: 0.5490  data_time: 0.0108  lr: 0.01  max_mem: 6318M
[06/23 16:58:24] d2.utils.events INFO:  eta: 1 day, 18:12:00  iter: 188599  total_loss: nan  loss_cls: nan  loss_box_reg: nan  loss_rpn_cls: 0.4502  loss_rpn_loc: 0.1369  loss_\
D_img_s: nan  loss_cls_pseudo: nan  loss_box_reg_pseudo: 0  loss_rpn_cls_pseudo: 0.5258  loss_rpn_loc_pseudo: 0  loss_D_img_s_pseudo: nan  loss_D_img_t: nan  time: 0.5491  data\
_time: 0.0097  lr: 0.01  max_mem: 6318M
yujheli commented 2 years ago

@miurakenichi Could you pull or re-clone the code. I remembered I updated the meta_arch to solve parts of the NAN issue recently which is not supposed to have loss_D_img_s_pseudo printed.

Will get back to you for other issues. I'm still working on it.