why my train loss ,after 4or5 epoch ,softmax value equal nan。

InsaneLife / dssm

DSSM and Multi-View DSSM

658 stars 230 forks source link

why my train loss ,after 4or5 epoch ,softmax value equal nan。 #6

Closed chongqingwei closed 5 years ago

chongqingwei commented 6 years ago

为什么我的在训练5个epoch ，loss还在下降，但是输出的softmax的值都变味nan了。结果auc变味0.5了。

InsaneLife commented 5 years ago

原因可能会有很多，应该和具体数据相关。 ps: 由于之前代码api过时，已更新最新代码于：dssm_rnn.py

数据处理代码data_input.py 和数据data 已经更新，由于使用了rnn，所以输入非bag of words方式。

chongqingwei commented 5 years ago

已经解决，感谢。

------------------ 原始邮件 ------------------ 发件人: "AaronChou"notifications@github.com; 发送时间: 2019年5月5日(星期天) 晚上8:59 收件人: "InsaneLife/dssm"dssm@noreply.github.com; 抄送: "251132021"251132021@qq.com; "Author"author@noreply.github.com; 主题: Re: [InsaneLife/dssm] why my train loss ,after 4or5 epoch ,softmax value equal nan。 (#6)

原因可能会有很多，应该和具体数据相关。 ps: 由于之前代码api过时，已更新最新代码于：dssm_rnn.py

数据处理代码data_input.py 和数据data 已经更新，由于使用了rnn，所以输入非bag of words方式。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

InsaneLife commented 5 years ago

能否分享一下原因和解决，供大家参考。

InsaneLife commented 5 years ago

出现NAN一般是由于分母出现0，或者log中出现非正数，可以采用截断(tf.clip_by_value)避免。

bingoohe commented 5 years ago

你好，能说下具体怎么解决loss出现nan问题的吗? 我也遇到了，谢谢你

chongqingwei commented 5 years ago

首先确认样本正确性，其次不要用adam，用adgrad

发自我的iPhone

------------------ Original ------------------ From: DaBinGOGOGO notifications@github.com Date: Sun,Jun 9,2019 9:43 AM To: InsaneLife/dssm dssm@noreply.github.com Cc: chongqingwei 251132021@qq.com, Author author@noreply.github.com Subject: Re: [InsaneLife/dssm] why my train loss ,after 4or5 epoch ,softmax value equal nan。 (#6)

你好，能说下具体怎么解决loss出现nan问题的吗? 我也遇到了，谢谢你

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.