I run this experiments according to the wiki step by step. I did not change any hyper-parameters except that I set gpu=0 to use a cpu for training. But I found that the loss became nan just after 1K steps training.
...
step 1655 - loss nan - moving ave loss nan
step 1656 - loss nan - moving ave loss nan
Finish 92 epoch(es)
step 1657 - loss nan - moving ave loss nan
step 1658 - loss nan - moving ave loss nan
...
Do you have any idea or have you ever observed this kind of strange thing? Thank you in advance for your help!
Hi, thank you for your great work!
I run this experiments according to the wiki step by step. I did not change any hyper-parameters except that I set gpu=0 to use a cpu for training. But I found that the loss became nan just after 1K steps training.
Do you have any idea or have you ever observed this kind of strange thing? Thank you in advance for your help!