They analyze the effect of post-processing methods in contextualized word embeddings.
2. What is amazing compared to previous works?
They reveal that applying zscore (a simple post-process method in feature-based machine learning) constantly improves performance because of the huge variance of contextualized word embeddings.
3. Where is the key to technologies and techniques?
They applied two feature-based normalizations (z score and min-max) and two postprocessing for static word embeddings:
z score: zero-mean and unit variance for each layer/dimension
min-max: for each word $w$ in a sentence $sent$ from layer $l$, contextualized word vector $z{w, sent, l}$, post-processed as follows:
$$\frac{z{w, sent, l} - \min(Z)}{\max(Z) - \min(Z)}$$
where $Z$ is all token vectors from all layers.
all-but-the-top (abtt): apply zero-mean, then subtract principal component (#51)
0. Paper
1. What is it?
They analyze the effect of post-processing methods in contextualized word embeddings.
2. What is amazing compared to previous works?
They reveal that applying zscore (a simple post-process method in feature-based machine learning) constantly improves performance because of the huge variance of contextualized word embeddings.![スクリーンショット 2023-02-08 11 27 35](https://user-images.githubusercontent.com/45454055/217413122-8d90b72a-6964-480a-a4a1-6c2b3e0f934b.png)
3. Where is the key to technologies and techniques?
They applied two feature-based normalizations (z score and min-max) and two postprocessing for static word embeddings:
4. How did evaluate it?
These Figures show that:
From these findings, applying zscore (and abtt) is recommended using contextualized word embeddings.
5. Is there a discussion?
6. Which paper should read next?