Implementation of Rerank for RankVicuna

castorini / rank_llm

RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.

http://rankllm.ai

Apache License 2.0

294 stars 38 forks source link

Implementation of Rerank for RankVicuna #23

Closed yilinjz closed 10 months ago

yilinjz commented 10 months ago

Rerank can take either query + docs or query + hits; examples can be found in the newly added demo/ folder.

The following fields of RankVicuna are given default values:

context_size: int = 4096,
prompt_mode: PromptMode = PromptMode.RANK_GPT,
device: str = "cuda",
num_gpus: int = 1,

Removed two fields from RankLLM (top_k_candidates and dataset); RankVicuna currently does not use those two flags, and since RankVicuna inherited from RankLLM (and RankLLM does not use them either) I removed them from RankLLM as well.

sahel-sh commented 10 months ago

Rank Vicuna and Rank_LLM already have a rerank that gets a dataset, retrieves its top k candidates and then uses the provided model to rerank them. Please look at run_rank_llm.py to see how the current functionally works. If you want the added functionality for reranking the provided documents, or hits, it should not replace the current functionality. But added as extra.

ronakice commented 10 months ago

I feel it is fine to disentangle these? Rerank should ideally only work on reranking and not entangled with retriever? We should reorganise And have say another retrieve_and_rerank method that mirrors what you have but in a different module.

yilinjz commented 10 months ago

@sahel-sh @ronakice Updated the PR with changes we discussed last week.

A few questions:

write_rerank_result() method is now moved to the new Reranker class; do we still want to include "dataset" and "top_k_candidates" as part of the output file name?
I wrapped the 3 cases of retrieving into one retrieve() method in the new Retriever class, do you think this way is okay?

yilinjz commented 10 months ago

Accidentally closed this when resolving a conflict... made another PR here https://github.com/castorini/rank_llm/pull/24