castorini / rank_llm

RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
http://rankllm.ai
Apache License 2.0
312 stars 39 forks source link

P1-pyserini retriever should suppport custom index dir #38

Closed sahel-sh closed 8 months ago

sahel-sh commented 8 months ago

The pyserini retriever currently supports dataset names only, we should add support for retrieval with custom prebuild indexes

sahel-sh commented 8 months ago

@jasper-xian is working on this

jasper-xian commented 8 months ago

@ronakice if we want to support custom user indices, wouldn't we also need the user to provide their own topics files and qrels?

I can hack around this (kind of) by having the user give pyserini prebuilt topics, but if they give a custom topics file we'd need them to provide qrels too no?

ronakice commented 8 months ago

qrels are not necessary, topics yes, if it is custom something like a simple TSV loader. It would be nicer if there is some interaction with pyserini and being able to automatically include all topics/qrels pyserini has support for!

ronakice commented 8 months ago

@sahel-sh what do you think?

sahel-sh commented 8 months ago

I think there is value in supporting Pyserini with custom index, but we should try to keep it simple. If the topics and index dirs are needed lets ask the user to provide them and qrel as optional for evaluation. Our code supports ranking hits/results stored in a file, so users always have the option of whatever they want from pyserini and then calling rerank only.

sahel-sh commented 8 months ago

Thank you Jasper for working on this issue!