They propose a metric to predict the optimal (well-specialized) layer for fine-tuning.
2. What is amazing compared to previous works?
Their metric does not require any training or hyperparameter tuning.
Their optimal layer effectively performs downstream tasks with 500% more computation cost.
3. Where is the key to technologies and techniques?
3.1 Metric
They define the metric to evaluate the optimal (well-specialized) layer as follows:
calculate sequence (sentence) embeddings by averaging all tokens (except for CLS token)
clustering by the target labels (blue, red, and yellow in Figure 1)
calculate the within-group variability and between-group variability as:
based on these scores, they define the task speciality of the layer
3.2 Tuning strategy
Based on the optimal layer, they tried four tuning strategies in Figure 2.
4. How did evaluate it?
Figure 3 shows that their metric is highly correlated with task performance.
Figures 4 and 5 show that their tuning strategies (tuning only optimal layers) achieve comparable performance to tuning all of the layers.
0. Paper
1. What is it?
They propose a metric to predict the optimal (well-specialized) layer for fine-tuning.
2. What is amazing compared to previous works?
Their metric does not require any training or hyperparameter tuning. Their optimal layer effectively performs downstream tasks with 500% more computation cost.
3. Where is the key to technologies and techniques?
3.1 Metric
They define the metric to evaluate the optimal (well-specialized) layer as follows:
calculate sequence (sentence) embeddings by averaging all tokens (except for CLS token)
clustering by the target labels (blue, red, and yellow in Figure 1)
calculate the within-group variability and between-group variability as:![スクリーンショット 2023-06-19 13 17 06](https://github.com/a1da4/paper-survey/assets/45454055/a8d4a988-0c16-4497-8087-bfd3bf272fd5)
based on these scores, they define the task speciality of the layer![スクリーンショット 2023-06-19 13 18 42](https://github.com/a1da4/paper-survey/assets/45454055/0518ed0f-bbac-4f01-9ca1-62573c71b83e)
3.2 Tuning strategy
Based on the optimal layer, they tried four tuning strategies in Figure 2.
4. How did evaluate it?
5. Is there a discussion?
6. Which paper should read next?