codertimo / BERT-pytorch

Google AI 2018 BERT pytorch implementation
Apache License 2.0
6.11k stars 1.29k forks source link

How does Next-Sentence-Prediction benefit to both QA and NLI? #27

Closed guotong1988 closed 5 years ago

guotong1988 commented 5 years ago

The input for the one bidirectional transformer to pretrain is two sentences' concatenation. I think the one bidirectional transformer is 'storing' the info of two sentences. But in QA and NLI, we have two transformers and each transformer's input is one sentence.

guotong1988 commented 5 years ago

image

Oh, I see..