Closed MarkWuNLP closed 5 years ago
@MarkWuNLP good question!. I think computing the masked lm loss on negative sample, make sense.
Even if the negative sample(sentence B) is not the next sentence of sentence A, sentence B is still natural language sentence. As far as I know, the objective goal is to make this model can understand the natural language sentence very well. And also next sentence prediction and masked LM is totally different task.
For these reasons, computing lm loss on negative sample can be applied to, as same as the positive sample.
In addition to @codertimo 's reply, it seems reasonable to apply MaskedLM on negative samples since authors provided the masked negative pair example in their paper.
Input = [CLS] the man went to [MASK] store [SEP]
he bought a gallon [MASK] milk [SEP]
Label = IsNext
Input = [CLS] the man [MASK] to the store [SEP]
penguin [MASK] are flight ##less birds [SEP]
Label = NotNext
I see. Thank you for your answer!
Hi, Thank you for your clean code on Bert. I have a question about Mask LM loss after I read your code. Your program computes a mask language model loss on both positive sentence pairs and negative pairs.
Does it make sense to compute Mask LM loss on negative sentence pairs? I am not sure how Google computes this loss.