Reading: On the Sentence Embeddings from Pre-trained Language Models

0. Paper

paper: link
EMNLP2020

1. What is it?

They propose a method to convert sentence embeddings into an isotropic Gaussian space to improve sentence representation.

2. What is amazing compared to previous works?

Their method improves the performance of sentence textual similarity tasks.

3. Where is the key to technologies and techniques?

Convert vector space (anisotropy, biased) into Gaussian space (isotropy, unbiased)

During training, they optimize the following function as follows:

They performed spatial transformations by NN. The only parameters that are adjusted during training are those of the NN and never change the parameters of the BERT (computational resource friendly).

4. How did evaluate it?

From Table 2, their method constantly outperforms baselines.

5. Is there a discussion?

vs Standard Normalization, and all-but-the-top (NATSV in this paper) #51

Table 5 shows that their method outperforms these previous works. Thay mention that

We argue that NATSV can help eliminate anisotropy but it may also discard some useful information contained in the nulled vectors. On the contrary, our method directly learns an invertible mapping to isotropic latent space without discarding any information.

(→ What is "useful information"? )

a1da4 / paper-survey