when i train it with coverage ,the loss is nan when i get 250k iter?

atulkum / pointer_summarizer

pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"

Apache License 2.0

906 stars 242 forks source link

when i train it with coverage ,the loss is nan when i get 250k iter? #54

Closed XuemingQiu closed 3 years ago

XuemingQiu commented 3 years ago

how to solve it ? thank you!

XuemingQiu commented 3 years ago

when you miss the error, please take care of the divide zero in decoder attention computation part , change to fellow: normalization_factor = attn_dist_.sum(1, keepdim=True) import sys attn_dist = attn_dist_ / (normalization_factor.view(-1,1) + torch.ones_like(normalization_factor.view(-1, 1)) * sys.float_info.epsilon)