Reading: Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View

0. Paper

@inproceedings{hu-etal-2019-diachronic, title = "Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View", author = "Hu, Renfen and Li, Shen and Liang, Shichen", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/P19-1379", doi = "10.18653/v1/P19-1379", pages = "3899--3908", abstract = "Diachronic word embeddings have been widely used in detecting temporal changes. However, existing methods face the meaning conflation deficiency by representing a word as a single vector at each time period. To address this issue, this paper proposes a sense representation and tracking framework based on deep contextualized embeddings, aiming at answering not only what and when, but also how the word meaning changes. The experiments show that our framework is effective in representing fine-grained word senses, and it brings a significant improvement in word change detection task. Furthermore, we model the word change from an ecological viewpoint, and sketch two interesting sense behaviors in the process of language evolution, i.e. sense competition and sense cooperation.", }

1. What is it?

They try to use a contextual word embedding (e.g. BERT) in historical semantic change.

2. What is amazing compared to previous works?

They use a contextual word embedding (BERT) to obtain more than two embeddings in one word. Their model can predict how the semantic is changed.

3. Where is the key to technologies and techniques?

3.1 Obtain semantic representation"s" in each word

They use the Oxford dictionary to obtain word meanings and its sentences.

In each semantics sj(j: 1, ..., J) of word w, they use 10 sentences sent(sj)1, ..., sent(sj)n.
Then, they obtain representations e(sj)1, ..., e(sj)n about semantic sj of word w in 10 sentences.
They defined a semantic representation e(sj) as the average of all representations e(sj)1, ..., e(sj)n.

3.2 Sense identification in test data (COHA)

In the test data (COHA), they compute word representations. To identify the semantic of the word in the data, they use cosine similarity with semantic representations e(s1), ..., e(sj), ..., e(sJ). The nearest meaning of representation is assigned for the word.

4. How did evaluate it?

4.1 Word sense identification

They obtain other sentences from the Oxford dictionary. The model predict the semantic of the target word.

Their model(BERT) missed some words, these can be divided into two types:

not real mistakes (sentence1, 2)
real bad cases, there is a meaning overlap between two senses (sentence3) and the text is too short to obtain sufficient information (sentence4)

4.2 Word meaning change

They use the test set from Gulordava and Baroni (2011). Each word is annotated on 4-point scale (0 is not change, 3 is significantly change) They computed the correlation between the novelty score and human score. The novelty score is defined as the proportion of usages of semantic sj in the focus corpus and reference corpus.