Closed tangzhy closed 4 years ago
Yes, when pre-trained with various text, the vanilla Transformer can also have very good sense of direction and distance. And this has been verified by many papers which investigate the effectiveness of BERT. Our proposal wants to discuss why the vanilla transformer cannot do well in the NER task, and based on the discussion we make some improvements, and luckily it worked. But this is by no means the only way to solve this problem. And BERT can definitely achieve better performance than TENER.
Hi, have you compared adapted transformer with bert, where pre-trained knowledge might make up for the drawback of vanilla transformer?