[14/09/2020] Monday 7:00 PM GMT+2 How Can We Know What Language Models Know?

How Can We Know What Language Models Know?

Authors: Zhengbao Jiang, Frank F. Xu, Jun Araki, Graham Neubig (CMU)

URL: https://arxiv.org/abs/1911.12543

Abstract:

Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as "Obama is a by profession". These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as "Obama worked as a " may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts and ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 38.1%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA)

An interesting paper for analysis and probing of pre-trained language models factual knowledge, which builds upon a previously discussed paper https://github.com/eg-nlp-community/nlp-reading-group/issues/7 (Language Models as Knowledge Bases, Petroni et al., 2019).

TLDR: How can we generate more diverse samples, similar to (Barak Obama was born in _ ) to query language models in a more accurate way?

The main claim is that the manually generated prompts (LAMA dataset) to query LMs are suboptimal, in a way that rephrasing those prompts in a different way, would help LMs produce better results on the same prompts.

So to validate this claim the authors have proposed two ways to automatically create more versatile prompts: 1) Mining based: to extract prompts from wikidata, and then run a dependency parsing model on top of extracted prompts to select triplets (subject, object, relation). 2) Paraphrasing based: using back-translation (English: German, German: English) to generate new lexically diverse prompts.

During experiments, the hypothesis - that LMs know more factual information than what was previously reported - was validated, given the new generated prompts, they experimented on BERT base & large, ERNIE and KnowBERT.

The results were consistently better on all models using the generated prompts than the manual ones, they have also shown that LMs with external factual knowledge (such as ERNIE), outperforms BERT on such task.

Proposed potential future directions: 1) more robust LMs that can be queried in different ways but still return similar results 2) methods to incorporate factual knowledge in LMs 3) further improvements in optimizing methods to query LMs for knowledge.

eg-nlp-community / nlp-reading-group

[14/09/2020] Monday 7:00 PM GMT+2 How Can We Know What Language Models Know? #25