Why I got an nan loss in training

Xuwanjun9 commented 4 years ago

I remove the .bool() method and use dtype=torch.uint8 instead because of the version of pytorch, but I got nan for loss, is there anyone meet the same problem?

XiaoyuanYi commented 4 years ago

@Xuwanjun9 Hi, can you provide more details， like the screen capture of your error information and the code lines you have changed， so that I can help you handle your problem？

Xuwanjun9 commented 4 years ago

@Xuwanjun9 Hi, can you provide more details， like the screen capture of your error information and the code lines you have changed， so that I can help you handle your problem？

@Xuwanjun9 Hi, can you provide more details， like the screen capture of your error information and the code lines you have changed， so that I can help you handle your problem？

Thank you so much for your replying. The error information when I run the complete original code is like this: My torch version is 1.1.0, and there is no .bool() attribute in this version so I changed the code in graphs.py, line 168 and 360. ， , where the original code should be and . Except for this, I changed nothing. Seems everything is fine when pre-training, the last epoch in pretraining is .But the result of the changed code is like this in training process, where for the first one, the ouput is empty but the loss is 9.658, however after several iteration, the output is always empty， and loss keeps nan. I‘ve tried to reduce the learing rate but it's unhelpful. Thank you again for your help！

THUNLP-AIPoet / WMPoetry

Why I got an nan loss in training #1