Reading: Temporal Attention for Language Models

0. Paper

authors: Guy D. Rosin, Kira Radinsky
paper: [arxiv]

my literature review (Japanese) is here

1. What is it?

They proposed an attention model for temporal analysis.

2. What is amazing compared to previous works?

Their attention mechanism achieves state-of-the-art performance in SemEval-2020 Task 1.

3. Where is the key to technologies and techniques?

Theoretically, each token in an input sequence could have its own time point.

From this idea, they proposed Temporal Attention to generate time-specific word vectors using time vectors Xt and their weights Wt. Temporal Attention can be calculated as follows: スクリーンショット 2022-06-11 11 15 34 where $$T = X^t Wt$$

4. How did evaluate it?

Data: SemEval-2020 Task 1 (English, Latin, German)
Task: Subtask 2 (rank target words in order of degree of semantic change)
Evaluation: correlation coefficient

スクリーンショット 2022-06-11 11 18 38 From this Table, Temporal Attention outperforms strong baselines (SGNS+alignment, BERT+fine-tuning).

5. Is there a discussion?

スクリーンショット 2022-06-11 11 19 55 From this table, they hypothesized that to understand time there is no need to use extremely large models.

6. Which paper should read next?

Time Masking for Temporal Language Models
- Their interesting previous method

a1da4 / paper-survey