Reading: Effect of Post-processing on Contextualized Word Representations

0. Paper

COLING2022
paper: [link]

1. What is it?

They analyze the effect of post-processing methods in contextualized word embeddings.

2. What is amazing compared to previous works?

They reveal that applying zscore (a simple post-process method in feature-based machine learning) constantly improves performance because of the huge variance of contextualized word embeddings. スクリーンショット 2023-02-08 11 27 35

3. Where is the key to technologies and techniques?

They applied two feature-based normalizations (z score and min-max) and two postprocessing for static word embeddings:

z score: zero-mean and unit variance for each layer/dimension
min-max: for each word $w$ in a sentence $sent$ from layer $l$, contextualized word vector $z{w, sent, l}$, post-processed as follows: $$\frac{z{w, sent, l} - \min(Z)}{\max(Z) - \min(Z)}$$ where $Z$ is all token vectors from all layers.
all-but-the-top (abtt): apply zero-mean, then subtract principal component (#51)
vector length normalization (ulen): vector length = 1 (#108)

4. How did evaluate it?

スクリーンショット 2023-02-08 11 00 10 スクリーンショット 2023-02-08 11 00 36

These Figures show that:

min-max and ulen do not improve performance
zscore and abtt significantly improve performance
a combination of zscore and abtt (these two methods are orthogonal) improves more

From these findings, applying zscore (and abtt) is recommended using contextualized word embeddings.

a1da4 / paper-survey