While BERT hides one token (like a word) with a mask and fills it in, Span-BERT hides consecutive tokens (like an idiom) and fills them in. In addition, it does not perform Next Sentence Prediction and only makes the user input a single sentence. It outperforms BERT in many tasks.
TL;DR
While BERT hides one token (like a word) with a mask and fills it in, Span-BERT hides consecutive tokens (like an idiom) and fills them in. In addition, it does not perform Next Sentence Prediction and only makes the user input a single sentence. It outperforms BERT in many tasks.
Why it matters:
Paper URL
https://arxiv.org/abs/1907.10529
Submission Dates(yyyy/mm/dd)
2019/07/24
Authors and institutions
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy
Methods
Results
Comments