Reading: An Isotropy Analysis in the Multilingual BERT Embedding Space

0. Paper

ACL 2022 Findings
paper: [link]

1. What is it?

They analyze isotropy in multilingual language models.

2. What is amazing compared to previous works?

They reveal that:

multilingual contextualized word embeddings are also anisotropy
monolingual word embedding spaces in multilingual language models have frequency bias (language-independent)

3. Where is the key to technologies and techniques?

Principal component based (#51 ): measures the magnitude of the dependence of a given set of vectors on each dimension
Cosine based (https://aclanthology.org/D19-1006/): average cosine similarity

4. How did evaluate it?

4.1 Multilingual language models are also anisotropy

Table 1 shows the isotropy scores (close to 1 = isotropy), but the scores of all languages are far from 1 (anisotropy).

As a result of labeling with word frequency obtained from wordfreq (https://pypi.org/project/wordfreq/), they found that high frequency words (yellow) are distributed in the center and low frequency words (black) are distributed outside. →This result is different from monolingual language models (high freq.: outside, low freq.: center).

4.2 Isotropic vector space improves the performance also in multilingual language models

a1da4 / paper-survey