Closed wlpdlut closed 5 years ago
@wlpdlut It's because I used the NLLLoss for criterion.
NLLLoss input should be (batch_size, N_class, seq_len)
, please check the docs in under link.
https://pytorch.org/docs/stable/nn.html?highlight=nll#torch.nn.NLLLoss
Please tell me if you have any other question, thanks 👍
Thank you so much, I understand it now.
There is a line like below in pretrain.py,
mask_loss = self.criterion(mask_lm_output.transpose(1, 2), data["bert_label"])
I run it, and find "mask_lm_output" is like "batch_sizeinput_lengthvocab_size", and "data["bert_label"]" like "batch_size*input_length", if transpose as above, Does it make sense ? I am confused.