dorianbrown / rank_bm25

A Collection of BM25 Algorithms in Python
Apache License 2.0
979 stars 82 forks source link

cross lingual IR #31

Open krrishdholakia opened 1 year ago

krrishdholakia commented 1 year ago

Hi,

how can i use this to do cross-lingual IR? for eg. if the user query is in portugese and the corpus is in english?

krrishdholakia commented 1 year ago

an idea i had was to use the gpt2tokenizer but i wasn't sure how to use that alongside bm25 which is keyword based.