a1da4 / paper-survey

Summary of machine learning papers
32 stars 0 forks source link

Reading: Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning #270

Open a1da4 opened 1 year ago

a1da4 commented 1 year ago

0. Paper

1. What is it?

They propose a metric to predict the optimal (well-specialized) layer for fine-tuning.

2. What is amazing compared to previous works?

Their metric does not require any training or hyperparameter tuning. Their optimal layer effectively performs downstream tasks with 500% more computation cost.

3. Where is the key to technologies and techniques?

3.1 Metric

スクリーンショット 2023-06-19 13 14 09

They define the metric to evaluate the optimal (well-specialized) layer as follows:

3.2 Tuning strategy

スクリーンショット 2023-06-19 13 19 55

Based on the optimal layer, they tried four tuning strategies in Figure 2.

4. How did evaluate it?

スクリーンショット 2023-06-19 13 22 01 Figure 3 shows that their metric is highly correlated with task performance.

スクリーンショット 2023-06-19 13 24 58 Figures 4 and 5 show that their tuning strategies (tuning only optimal layers) achieve comparable performance to tuning all of the layers.

5. Is there a discussion?

6. Which paper should read next?