Recently, these authors proposed a method for analyzing a relationship between fine-tuned representations and human-defined information (part-of-speech, morph, or chunk)
In that work, they proposed an alignment score, which evaluate how many concept $C_2$ (e.g. NN: noun, singular or mass tagged words) contains words $w$ from concept $C_1$ (e.g. each cluster from fine-tuned LM).
In the experiment, they evaluate how many concepts (human-defined) fine-tuned model can align ($\theta = 0.9$)
Figure 2 from the paper shows that the rate of aligned concepts
$$\frac{Number\ of\ concepts\ Eq.1 = 1}{Number\ of\ concepts}$$
This paper is an upgraded version of the above paper.
3. Where is the key to technologies and techniques?
the lower layers of the model retain the general language concepts learned in the base model
the upper layers learn task-specific concepts.
4.2 Human-defined (pos, morph, chunk)
Figure 4 shows that models forget the POS information in the upper layers (POS is less important for sentence classification tasks).
4.3 Task-specific (positive / negative)
Figure 5 shows that models learn positive / negative tags in their upper layers.
5. Is there a discussion?
From Figures 3, 4, and 5, only the ALBERT model behaves differently from other models.
They concluded that the cause was the Cross-Layer Paremeter Sharing possessed by ALBERT.
0. Paper
1. What is it?
In this paper, authors further evaluated the methods they proposed in the past (https://aclanthology.org/2022.naacl-main.225) with multiple frameworks.
2. What is amazing compared to previous works?
Previous works try to analyze fine-tuned representations using supervised classification tasks.
Recently, these authors proposed a method for analyzing a relationship between fine-tuned representations and human-defined information (part-of-speech, morph, or chunk)
In that work, they proposed an alignment score, which evaluate how many concept $C_2$ (e.g.
In the experiment, they evaluate how many concepts (human-defined) fine-tuned model can align ($\theta = 0.9$)
Figure 2 from the paper shows that the rate of aligned concepts
$$\frac{Number\ of\ concepts\ Eq.1 = 1}{Number\ of\ concepts}$$
![スクリーンショット 2023-05-01 13 35 31](https://user-images.githubusercontent.com/45454055/235407172-30139bc1-1cb6-442c-b4e3-7a26e25a9f4b.png)
NN: noun, singular or mass
tagged words) contains words $w$ from concept $C_1$ (e.g. each cluster from fine-tuned LM).This paper is an upgraded version of the above paper.
3. Where is the key to technologies and techniques?
They adapt concept matching score (https://aclanthology.org/2022.naacl-main.225) for
4. How did evaluate it?
4.1 Embedding Space (clusters)
Figure 3 shows that:
4.2 Human-defined (pos, morph, chunk)
Figure 4 shows that models forget the POS information in the upper layers (POS is less important for sentence classification tasks).
4.3 Task-specific (positive / negative)
Figure 5 shows that models learn positive / negative tags in their upper layers.
5. Is there a discussion?
From Figures 3, 4, and 5, only the ALBERT model behaves differently from other models. They concluded that the cause was the Cross-Layer Paremeter Sharing possessed by ALBERT.
6. Which paper should read next?