Closed XuemingQiu closed 3 years ago
when you miss the error, please take care of the divide zero in decoder attention computation part , change to fellow:
normalization_factor = attn_dist_.sum(1, keepdim=True) import sys attn_dist = attn_dist_ / (normalization_factor.view(-1,1) + torch.ones_like(normalization_factor.view(-1, 1)) * sys.float_info.epsilon)
how to solve it ? thank you!