THUNLP-AIPoet / WMPoetry

Source codes of Chinese Poetry Generation with a Working Memory Model (IJCAI 2018)
41 stars 15 forks source link

Why I got an nan loss in training #1

Closed Xuwanjun9 closed 4 years ago

Xuwanjun9 commented 4 years ago

I remove the .bool() method and use dtype=torch.uint8 instead because of the version of pytorch, but I got nan for loss, is there anyone meet the same problem?

XiaoyuanYi commented 4 years ago

@Xuwanjun9 Hi, can you provide more details, like the screen capture of your error information and the code lines you have changed, so that I can help you handle your problem?

Xuwanjun9 commented 4 years ago

@Xuwanjun9 Hi, can you provide more details, like the screen capture of your error information and the code lines you have changed, so that I can help you handle your problem?

@Xuwanjun9 Hi, can you provide more details, like the screen capture of your error information and the code lines you have changed, so that I can help you handle your problem?

Thank you so much for your replying. The error information when I run the complete original code is like this: 图片 My torch version is 1.1.0, and there is no .bool() attribute in this version so I changed the code in graphs.py, line 168 and 360. 图片图片 , where the original code should be 图片 and 图片. Except for this, I changed nothing. Seems everything is fine when pre-training, the last epoch in pretraining is 图片 .But the result of the changed code is like this in training process, where for the first one, the ouput is empty but the loss is 9.658, however after several iteration, the output is always empty, and loss keeps nan. 图片 I‘ve tried to reduce the learing rate but it's unhelpful. Thank you again for your help!