cross-entropy loss - Githubissues

Hi, I have a question about cross-entropy loss.

In section 2.1, it says we compute the probability of the sequence auto-regressively and train the model using cross-entropy loss on all tokens except the mask sentinel tokens <Mask:k>.

In the code base you provided to me two days ago (link), I found that the criterion weight of these sentinel tokens are set to zero: self.criterion_weights[self.sentinel_tokens[i]] = 0.0.

Now, I want to implement this in my own code. I only have one sentinel token MASK, and my code is,

loss = F.cross_entropy(logits, target, ignore_index=MASK)

Is this implementation correct? Why it can ensure that MASK would not be generated during inference time.

Thanks.

dpfried / incoder

cross-entropy loss #15