Eniac-Xie / faster-rcnn-resnet

ResNet Implementation for Faster-rcnn
MIT License
206 stars 117 forks source link

Got a nan loss. #23

Open ghost opened 6 years ago

ghost commented 6 years ago

# I followed the steps on your website and installed the project successfully. However, I got the nan loss as follow (both OHEM and no OHEM model). How to fix it?

I0503 21:38:45.559895 5802 solver.cpp:229] Iteration 0, loss = 4.34748 I0503 21:38:45.559942 5802 solver.cpp:245] Train net output #0: loss_bbox = 0.0506645 ( 1 = 0.0506645 loss) I0503 21:38:45.559960 5802 solver.cpp:245] Train net output #1: loss_cls = 3.57055 ( 1 = 3.57055 loss) I0503 21:38:45.559969 5802 solver.cpp:245] Train net output #2: rpn_cls_loss = 0.693147 ( 1 = 0.693147 loss) I0503 21:38:45.559978 5802 solver.cpp:245] Train net output #3: rpn_loss_bbox = 0.111002 ( 1 = 0.111002 loss) I0503 21:38:45.559988 5802 sgd_solver.cpp:106] Iteration 0, lr = 0.001 I0503 21:38:53.354439 5802 solver.cpp:229] Iteration 20, loss = nan I0503 21:38:53.354485 5802 solver.cpp:245] Train net output #0: loss_bbox = nan ( 1 = nan loss) I0503 21:38:53.354496 5802 solver.cpp:245] Train net output #1: loss_cls = 87.3365 ( 1 = 87.3365 loss) I0503 21:38:53.354506 5802 solver.cpp:245] Train net output #2: rpn_cls_loss = 0.683885 ( 1 = 0.683885 loss) I0503 21:38:53.354516 5802 solver.cpp:245] Train net output #3: rpn_loss_bbox = 0.261735 ( 1 = 0.261735 loss) I0503 21:38:53.354523 5802 sgd_solver.cpp:106] Iteration 20, lr = 0.001

I have appended clip_gradients on solver.pt. It got such output:

I0503 21:54:19.212813 6123 solver.cpp:229] Iteration 0, loss = 4.34748 I0503 21:54:19.212860 6123 solver.cpp:245] Train net output #0: loss_bbox = 0.0506645 ( 1 = 0.0506645 loss) I0503 21:54:19.212872 6123 solver.cpp:245] Train net output #1: loss_cls = 3.57055 ( 1 = 3.57055 loss) I0503 21:54:19.212880 6123 solver.cpp:245] Train net output #2: rpn_cls_loss = 0.693147 ( 1 = 0.693147 loss) I0503 21:54:19.212888 6123 solver.cpp:245] Train net output #3: rpn_loss_bbox = 0.111002 ( 1 = 0.111002 loss) I0503 21:54:19.212903 6123 sgd_solver.cpp:106] Iteration 0, lr = 0.001 I0503 21:54:19.242952 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 142.39 > 2) by scale factor 0.014046 I0503 21:54:19.705423 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 193806 > 2) by scale factor 1.03196e-05 I0503 21:54:20.113083 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 6.85844e+12 > 2) by scale factor 2.91611e-13 I0503 21:54:20.508262 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:20.883802 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:21.278784 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:21.695289 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:22.107420 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:22.509076 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:22.893065 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:23.297940 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:23.699441 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0 I0503 21:54:24.086650 6123 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm inf > 2) by scale factor 0

Using a smaller learning rate also not worked.