castorini / pygaggle

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
http://pygaggle.ai/
Apache License 2.0
329 stars 97 forks source link

Fix FiD text_maxlength #310

Closed manveertamber closed 1 year ago

manveertamber commented 1 year ago

As per the original FiD paper, text_maxlength i.e. the maximum number of tokens in text segments (question+passage), should be limited to 250. In our codebase, it was being limited to 350. From some testing I did, this did not make a huge difference in results, perhaps because most (question + 100-word passage) text segments were within this limit anyway.