Open jojonki opened 2 years ago
Hi! Thank you for your great work!
To my understanding, BLINK uses special tokens to represent a mention position and entity title for both bi-encoder and cross-encoder.
In cross-encoder, your code actually sets special tokens to the tokenizer. https://github.com/facebookresearch/BLINK/blob/main/blink/crossencoder/crossencoder.py#L82-L89
But in bi-encoder, add_special_tokens is not called which means special tokens are just processed as [UNK]. https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/biencoder.py#L82-L87
add_special_tokens
[UNK]
Did you write this intentionally? If so, could you elaborate on that?
@jojonki: Yes, the special tokens are missing form biencoder tokenizer.
Hi! Thank you for your great work!
To my understanding, BLINK uses special tokens to represent a mention position and entity title for both bi-encoder and cross-encoder.
In cross-encoder, your code actually sets special tokens to the tokenizer. https://github.com/facebookresearch/BLINK/blob/main/blink/crossencoder/crossencoder.py#L82-L89
But in bi-encoder,
add_special_tokens
is not called which means special tokens are just processed as[UNK]
. https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/biencoder.py#L82-L87Did you write this intentionally? If so, could you elaborate on that?