Paper Review: How much knowledge can you pack into the parameters of a language model?

Publisher

EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Link to The Paper

https://aclanthology.org/2020.emnlp-main.437/

Name of The Authors

Roberts, Adam; Raffel, Colin; Shazeer, Noam

Year of Publication

2020

Summary

This short paper is a successor of the T5 paper (Exploring the limits of transfer learning with a unified text-to-text transformer) which exploits the memorizing capability of T5 models. They modified three existing question-answering datasets for closed book context and compared their performance with SOTA techniques for each dataset. They differentiate open book and closed book answering such that — in the open book scheme, the model can consult external resources (often provided along with the question) to answer a question whereas, in the closed book scheme, models have to rely only on information implecitely stored in their parameters. The T5 1.1 model (11 billion parameters trained with unlabeled data only) with salient span masking (SSM) produced new SOTA recalls for WebQuestions and TriviaQA datasets.

Contributions of The Paper

Defines closed book question answering
Shows that the capacity for memorizing increases with the number of parameters (who didn't know this?)
Reveals the limitation of existing QA benchmarks such that
- phrasing mismatch (e.g., “April 15” vs. “April 15th”)
- incomplete annotation (not all true answers provided)
- some questions require context (e.g. USD to BDT conversion rate)

Comments

No response

RAISEDAL / RAISEReadingList