facebookresearch / BLINK

Entity Linker solution
MIT License
1.17k stars 231 forks source link

Missing `add_special_tokens` in biencoder? #122

Open jojonki opened 2 years ago

jojonki commented 2 years ago

Hi! Thank you for your great work!

To my understanding, BLINK uses special tokens to represent a mention position and entity title for both bi-encoder and cross-encoder.

In cross-encoder, your code actually sets special tokens to the tokenizer. https://github.com/facebookresearch/BLINK/blob/main/blink/crossencoder/crossencoder.py#L82-L89

But in bi-encoder, add_special_tokens is not called which means special tokens are just processed as [UNK]. https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/biencoder.py#L82-L87

Did you write this intentionally? If so, could you elaborate on that?

abhinavkulkarni commented 2 years ago

@jojonki: Yes, the special tokens are missing form biencoder tokenizer.