Closed crystal-xu closed 4 years ago
Hi, thank your for your interest!
The steps you describe seem good to me. Is the dataset concept-level or mention-level? Would you also mind providing some additional information regarding the error so I can figure out what is causing it?
Hi, Thanks for your quick reply.
The dataset has both concept-level and mention-level annotations. I have converted my dataset to the same format as yours and intend to do concept-level RE.
I have done some troubleshooting for the issue. I notice that if I remove the attention layer, the issue disappeared. But once I keep the attention layer, there will be "nan" values in the "m_cntx" matrix which seems to cause the problem. So, maybe the root cause is the attention layer.
Have you ever encounter similar issue before? I really appreciate your help.
Hi, thanks for the additional info!
Not really it was never an issue for me.
Actually, this error can be caused when calculating the weights in the attention layer. In this layer, padded words and mentions that are used as queries are masked. If the sentence has only 2 mentions and no other words the vector is filled with -inf and softmax returns only zeros.
Could you check the example sentence and mentions where this error first appears and see what it looks like?
Hi, thanks for your reply.
I have printed out the matrices where this error first appears. I notice that it seems to happen because of what you have mentioned, i.e. all words in the sentences are masked, and the vector is filled with -inf. However, the softmax just returns nan. Would you mind telling me if the attention layer should return all zeros in this case? If so, maybe manually replacing such a nan vector with a zero vector might work?
Thanks very much!
Hi,
The reason that softmax returns nan is because during computation the denominator is invalid (-inf). In this case, we can do a naive fix and when a vector is full of -inf, replace it with ones, compute softmax and the replace the resulted weights with zeros.
Could you replace line 64 here with the following? Also, please check that your resulted weight vector is full of zeros after this process. If that works for you, I'll make sure to update the code with this fix.
alpha = torch.where(torch.isinf(alpha).all(dim=2, keepdim=True), torch.full_like(alpha, 1.0), alpha)
alpha = self.softmax(alpha)
alpha = torch.where(torch.eq(alpha, 1/alpha.shape[2]).all(dim=2, keepdim=True), torch.zeros_like(alpha), alpha)
Hi,
I have tried your update and dumped the results. It works! Great Job! Thank you so much for your help!
BTW, I think it would be very useful if you could adapt your model to the multi-GPU scenario. And I have been doing this in my way. Thanks for your efforts.
No problem, happy to help! Sure, I will add this in my todo list and hopefully I will be able to this soon enough :)
I am closing this issue, since it seems resolved. Good luck!
Thanks for your great work.
I have been trying your model in a new task where there are 9 entity types and 97 relation types. I have done the following modifications:
Is there anything else I need to modify? I am encountering a "nan" issue in the graph layer but not sure whether it is because I forgot something important.
Thanks very much!