They analyze contextualized word representations from BERT.
2. What is amazing compared to previous works?
Previous work has found that BERT embeddings capture grammatical information (dependency tree) and it relates to Euclidean distance. -> This work analyzes more detail.
Previous works have analyzed embeddings from the Pipeline of BERT (pos tagging, coreference resolution, dependency labeling). -> This work analyzes internal representations from BERT.
3. Where is the key to technologies and techniques?
To analyze the grammatical information, they use model-wise attention vector.
4. How did evaluate it?
4.1 Grammatical information
Previous work found that the l2 distance between contextualized embeddings of BERT captures a dependency tree.
Figure 3 shows the average l2 distance of model-wise attention vector between two words with a given dependency label detects:
distant dependency such as parataxis (relation between the main verb of a clause and other sentential elements)
close dependency such as auxpass (passive information)
4.2 Internal representations
Figure 4 shows that BERT embeddings capture semantic information.
0. Paper
paper: arxiv
1. What is it?
They analyze contextualized word representations from BERT.
2. What is amazing compared to previous works?
3. Where is the key to technologies and techniques?
To analyze the grammatical information, they use model-wise attention vector.![スクリーンショット 2023-02-07 10 53 17](https://user-images.githubusercontent.com/45454055/217127787-03ec9547-eba8-4ced-a915-15de8dae05a3.png)
4. How did evaluate it?
4.1 Grammatical information
Previous work found that the l2 distance between contextualized embeddings of BERT captures a dependency tree.![スクリーンショット 2023-02-06 23 29 30](https://user-images.githubusercontent.com/45454055/216997962-4c66f4f2-0bdd-41f9-addc-8fd86688e2de.png)
Figure 3 shows the average l2 distance of model-wise attention vector between two words with a given dependency label detects:
parataxis
(relation between the main verb of a clause and other sentential elements)auxpass
(passive information)4.2 Internal representations
Figure 4 shows that BERT embeddings capture semantic information.![スクリーンショット 2023-02-06 23 29 52](https://user-images.githubusercontent.com/45454055/216998036-7a099e53-c2ae-4b7c-8b14-40df1999f905.png)
5. Is there a discussion?
6. Which paper should read next?