about the loss - Githubissues

hjjlovecyy commented 5 years ago

Hello, thank u for ur great work!!!! Thanks a lot. After modifying some code, the train.py can run successfully. But the loss is very strange, as follow: 0%| | 0/8 [00:00<?, ?it/s]Epoch0 Iter0 --- total_loss: nan, cls_loss: nan, reg_loss: 0.6299 12%|█▎ | 1/8 [00:02<00:14, 2.01s/it]Epoch0 Iter1 --- total_loss: nan, cls_loss: nan, reg_loss: 1.7844 25%|██▌ | 2/8 [00:03<00:09, 1.61s/it]Epoch0 Iter2 --- total_loss: nan, cls_loss: nan, reg_loss: 34.9781 38%|███▊ | 3/8 [00:04<00:07, 1.48s/it]Epoch0 Iter3 --- total_loss: nan, cls_loss: nan, reg_loss: 238.0343 50%|█████ | 4/8 [00:05<00:05, 1.41s/it]Epoch0 Iter4 --- total_loss: nan, cls_loss: nan, reg_loss: 236.5256 62%|██████▎ | 5/8 [00:06<00:04, 1.37s/it]Epoch0 Iter5 --- total_loss: nan, cls_loss: nan, reg_loss: 70.5485 75%|███████▌ | 6/8 [00:08<00:02, 1.35s/it]Epoch0 Iter6 --- total_loss: nan, cls_loss: nan, reg_loss: 113.4333 88%|████████▊ | 7/8 [00:09<00:01, 1.33s/it]Epoch0 Iter7 --- total_loss: nan, cls_loss: nan, reg_loss: 56.8303 100%|██████████| 8/8 [00:10<00:00, 1.25s/it] Saving model...

Have u met it before? Thanks.

hjjlovecyy commented 5 years ago

And I find that if I change total_loss.backward() in train.py to another loss, such as total_reg_loss.backward() "an illegal memory access was encountered" error no longer happens. And the result no longer appear nan like my question.

hjjlovecyy commented 5 years ago

nan still appear, and the model did not converge. this is also related to the batch_size

wosecz commented 5 years ago

Maybe you can check your pretrained model.

liushhAlex commented 5 years ago

@hjjlovecyy hello, I also met this problem (nan appears in loss). Have you solved this problem? thanks a lot!!

hjjlovecyy commented 5 years ago

@liushhAlex I follow the repos of songdejia, foolwood and this. Finally I write a demo which can train successfully. And I implement the local-to-global search strategy.

JingLi513 commented 5 years ago

@liushhAlex I follow the repos of songdejia, foolwood and this. Finally I write a demo which can train successfully. And I implement the local-to-global search strategy.

what's your auc on otb2015?

BigPuns commented 5 years ago

@hjjlovecyy how to slove the problem of nan, i met this problem too.

yanliu837 commented 5 years ago

@hjjlovecyy How to solve the problem of nan, I met this problem too. Thanks a lot!

StrugglingForBetter commented 4 years ago

I met the same proboelm how to solve that?

MathsShen / DaSiamRPNWithOfflineTraining

about the loss #6