Alibaba-NLP / ACE

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction
Other
296 stars 44 forks source link

Ask about BERT/XLM-R embeddings #16

Closed fym0503 closed 2 years ago

fym0503 commented 2 years ago

Hi, I have read your interesting paper and code. My question is: As BERT and XLM-R has many layers. I wonder what kind of embeddings you use ? Just the word embedding or a mixture of intermediate layer representation ? Did you find the difference between these options ? Thanks !

wangxinyu0922 commented 2 years ago

Hi,

There are two scenarios:

An alternative way is that you may train the model of each embedding with different settings (e.g. the last layer or the last four layers, w/ fine-tuning or w/o fine-tuning) and compare the model accuracy to decide the final usage of each embedding.

By the way, we use the first subtoken as the representation of each token.

fym0503 commented 2 years ago

Thanks, Your comments are very clear. As I have done some similar experiments, my conclusion is nearly the same.