codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.11k stars 1.29k forks source link

Mask language model loss #11

Closed MarkWuNLP closed 5 years ago

MarkWuNLP commented 5 years ago

Hi, Thank you for your clean code on Bert. I have a question about Mask LM loss after I read your code. Your program computes a mask language model loss on both positive sentence pairs and negative pairs.

Does it make sense to compute Mask LM loss on negative sentence pairs? I am not sure how Google computes this loss.

codertimo commented 5 years ago

@MarkWuNLP good question!. I think computing the masked lm loss on negative sample, make sense.

Even if the negative sample(sentence B) is not the next sentence of sentence A, sentence B is still natural language sentence. As far as I know, the objective goal is to make this model can understand the natural language sentence very well. And also next sentence prediction and masked LM is totally different task.

For these reasons, computing lm loss on negative sample can be applied to, as same as the positive sample.

shuuki4 commented 5 years ago

In addition to @codertimo 's reply, it seems reasonable to apply MaskedLM on negative samples since authors provided the masked negative pair example in their paper.

Input = [CLS] the man went to [MASK] store [SEP]
he bought a gallon [MASK] milk [SEP]
Label = IsNext

Input = [CLS] the man [MASK] to the store [SEP]
penguin [MASK] are flight ##less birds [SEP]
Label = NotNext
MarkWuNLP commented 5 years ago

I see. Thank you for your answer!