The input for the one bidirectional transformer to pretrain is two sentences' concatenation.
I think the one bidirectional transformer is 'storing' the info of two sentences.
But in QA and NLI, we have two transformers and each transformer's input is one sentence.
The input for the one bidirectional transformer to pretrain is two sentences' concatenation. I think the one bidirectional transformer is 'storing' the info of two sentences. But in QA and NLI, we have two transformers and each transformer's input is one sentence.