EdGENetworks / attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch
606 stars 133 forks source link

Loss can start as NaN #7

Closed JoaoLages closed 6 years ago

JoaoLages commented 7 years ago

Any solution for this or any idea why (should be a division by zero somewhere)? Happens sometimes

Sandeep42 commented 7 years ago

Can you give me some more context, is it starting as NaN or is it converging to NaN?

JoaoLages commented 7 years ago

It is starting with NaN, then it cannot converge anymore. If it starts with a anything else other than NaN, I never saw it converging to NaN

Sandeep42 commented 7 years ago

This problem didn't occur to me when I tested it, which data set were you using?

JoaoLages commented 7 years ago

I have been using another dataset, it's true, which I cannot share unfortunately. I was wondering if you had any idea on why it could happen and how to avoid it though

JasonMengcp commented 5 years ago

@JoaoLages I encountered the same NaN problem in some parameter settings. (usually happens when the hidden dimension is small.) After debugging, I found it is because two parameters (bias_word and bias_sent)are not initialized which may contain NaN. Add self.biasword.data.uniform(-0.1,0.1) to init() of AttentionWordRNN. Add self.biassent.data.uniform(-0.1,0.1) to init() of AttentionSentRNN.

It solved my problem. Hope this can help yours!