Thinklab-SJTU / R3Det_Tensorflow

Code for AAAI 2021 paper: R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object
Apache License 2.0
540 stars 122 forks source link

loss nan problem #142

Open pi1ing opened 2 years ago

pi1ing commented 2 years ago

Hello, thank you for your great job. I tried to train DOTA dataset with the default cfgs(backbone: resnet_50), however got training result like this:

************************************************************
2021-11-05 09:18:35: global_step:20  current_step:20
per_cost_time:4.518s
refine_cls_loss_stage3:0.000
cls_loss:1364.121
refine_reg_loss:0.000
refine_reg_loss_stage3:0.000
reg_loss:2.277
refine_cls_loss:741079.375
total_losses:742445.750
************************************************************
2021-11-05 09:18:44: global_step:40  current_step:40
per_cost_time:0.234s
refine_cls_loss_stage3:0.000
cls_loss:nan
refine_reg_loss:0.000
refine_reg_loss_stage3:0.000
reg_loss:nan
refine_cls_loss:nan
total_losses:nan

by the way, I have one Gefore RTX 3080ti, the development environment uses the recommanded docker images, but the first _, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op]) took me 10 min to run, is it normal?