Reading: Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology

0. Paper

@inproceedings{xu-etal-2019-treat, title = "Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology", author = "Xu, Yang and Zhang, Jiasheng and Reitter, David", booktitle = "Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/W19-4717", doi = "10.18653/v1/W19-4717", pages = "136--145", }

1. What is it?

In this paper, the authors used word-embeddings to capture the meaning of subwords. They tried 5 languages(1 East-Asian; Chinese, English, and 4 Europian; French, German, Italian, Spanish).

2. What is amazing compared to previous studies?

The authors proposed new models to characterize the semantic weights of subword units. In previous work, there are some models,

fasttext (skip-gram)
character-enhanced word embedding (CBOW)

However, in these models, the equations treat words and subwords equally.

3. Where is the key to technologies and techniques?

They proposed the Dynamic Subword-incorporated embedding model (DSE).

This model used the semantic weight parameter h, which is a learnable parameter. On the other hand, 1-h means the weight parameter of the subwords. They used 2 models that,

Skip-gram: fasttext
CBOW: character-enhanced word embedding

and the target vector x is calculated by the equation of these methods.

4. How did validate it?

They calculated the semantic weight parameter h when the first-appearance-year was changed.

The result shows that Europian languages need subwords more than the whole of words.
However, Chinese does not need subwords, but need the whole of words.

5. Is there a discussion?

In Chinese,

characters carry more semantic weight in older Chinese words than in newer Chinese words.

6. Which paper should read next?

The problem with this approach is that the learned word vectors are subject to random noise due to corpus size. This paper addresses this problem with a probabilistic variation of word2vec model. Dynamic word embeddings

a1da4 / paper-survey