Closed robinsongh381 closed 4 years ago
Hi
You have applied F.log_sofmax on the output of projection layer in [line 232] (https://github.com/HLTCHKUST/PAML/blob/3c1fe4e55956b74fe0682b431726e5396f8db490/model/transformer.py#L232).
F.log_sofmax
If we use nn.CrossEntropy for the loss function, the result of F.log_softmax enters in the loss function as in [line 333] (https://github.com/HLTCHKUST/PAML/blob/3c1fe4e55956b74fe0682b431726e5396f8db490/model/transformer.py#L333)
nn.CrossEntropy
F.log_softmax
So basically the output of the projection layer goes through F.log_softmax and then nn.CrossEntropy.
However, if you look at here, simply applying nn.CrossEntropy would automatically apply F.log_softmax so I think you should exlude the line 232 and instead just return [line 218] (https://github.com/HLTCHKUST/PAML/blob/3c1fe4e55956b74fe0682b431726e5396f8db490/model/transformer.py#L218).
What do you think ?
indeed we use self.criterion = nn.NLLLoss(ignore_index=config.PAD_idx) instead of nn.CrossEntropy
you can also use logits + CrossEntropy
Hi
You have applied
F.log_sofmax
on the output of projection layer in [line 232] (https://github.com/HLTCHKUST/PAML/blob/3c1fe4e55956b74fe0682b431726e5396f8db490/model/transformer.py#L232).If we use
nn.CrossEntropy
for the loss function, the result ofF.log_softmax
enters in the loss function as in [line 333] (https://github.com/HLTCHKUST/PAML/blob/3c1fe4e55956b74fe0682b431726e5396f8db490/model/transformer.py#L333)So basically the output of the projection layer goes through
F.log_softmax
and thennn.CrossEntropy
.However, if you look at here, simply applying
nn.CrossEntropy
would automatically applyF.log_softmax
so I think you should exlude the line 232 and instead just return [line 218] (https://github.com/HLTCHKUST/PAML/blob/3c1fe4e55956b74fe0682b431726e5396f8db490/model/transformer.py#L218).What do you think ?