Closed W-rudder closed 6 months ago
This is a very interesting job! One thing I am curious about is whether to use the [CLS] token embedding obtained through BERT processing as a feature or to use the last hidden states as features.
Hi, we take the mean pooling results of last hidden states as features.
Thanks for your response!
This is a very interesting job! One thing I am curious about is whether to use the [CLS] token embedding obtained through BERT processing as a feature or to use the last hidden states as features.