Closed jibrilfrej closed 5 years ago
Solved by adding an epsilon to the log in the cross-entropy loss:
(/losses/rank_cross_entropy_loss.py line 51)
Original : return -K.mean(K.sum(labels * K.log(K.softmax(logits)), axis=-1))
New: return -K.mean(K.sum(labels * K.log(K.softmax(logits) + np.finfo(float).eps ), axis=-1))
Seems a reasonable fix. Would you mind PR this change?
I just did it (pull request #776)
Great job! I approved it and soon it will be merged.
Describe the bug
RankCrossEntropyLoss produces NAN on some models (at least DUET and DRMMTKS)
It does not happen all the time: sometimes the model train without problems and other times the loss becomes nan after a few epoch.
I do not have this problem with the RankHingeLoss but the performance of the models is is bad compared to the RankCrossEntropyLoss (when it does not fail)
The problem appears MORE frequently on larger training sets (unfortunately I cannot share them).
The problem appears LESS frequently when I set model.params['embedding_trainable'] to False.
The problem appears on BOTH CPU and GPU
I tried to reduce the learning rate but it did not solve the problem
Here is a piece of code to reproduce the bug. As I have said before it does not happen all the time: I ran the code below 5 times and it produced nan 4 out of 5 times (around epoch 120)
Context