FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.55k stars 547 forks source link

why there isn't a sep token between query and passage for encoder reranker #1208

Open Atlantic8 opened 1 day ago

Atlantic8 commented 1 day ago

According to code in FlagEmbedding/abc/finetune/reranker/AbsDataset.py -> AbsRerankerTrainDataset:create_one_example(), there isn't a sep token between query and passage for reranker

I think a sep token is useful to separate query and passage, therefore lead to less confusion for the model.

I wonder why the sep token is missing, did any experiment prove that no sep token is better?

545999961 commented 1 day ago

For an encoder-based reranker, its tokenizer will automatically insert a [SEP] token between the two sentences.