Open jigsaw2212 opened 2 years ago
You can see how it's ingested here:
Huggingface allows giving pairs of sequences to a tokenizer (e.g. for question answering, NLI, etc.). I believe it usually has a separation token, i.e. {text} [SEP] {text_pair}
. In this case, text=title
and text_pair=paragraph
so it should look like {text} [SEP] {text_pair}
, but that depends on the tokenizer to implement it this way ultimately.
Hi, I want to understand better how the 'title' of the passage is used by the codebase in generating the passage embeddings