castorini / rank_llm

RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
http://rankllm.ai
Apache License 2.0
365 stars 43 forks source link

Merge lit5 to rankllm #127

Closed XKTZ closed 3 months ago

XKTZ commented 4 months ago

Descripton

This PR is a continuation of PR https://github.com/castorini/rank_llm/pull/116 which integrates LiT5 model into rank_llm.

Summary of changes

rerank.listwise.lit5 package

inherits the code from https://github.com/castorini/LiT5 repository which creates FiD-related utils

rerank.listwise.rank_fid

defines the RankFiDDistill and RankFiDScore model inherits from ListwiseRankLLM which implements the LiT5 algorithm

rerank.listwise.lit5_reranker

defines LiT5DistillReranker and LiT5ScoreReranker which is like ZerphyrReranker and VicunaReranker

rankllm.py

modifies the run_llm_batched, seems we need a **kwargs

reranker.py

modifies create_agent, supports LiT5, also fix some typo in extract_kwargs

Testing

Please use the following script to test LiT5Distill model

python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_mode=LiT5  --context_size=150 --vllm_batched --batch_size=${batchsize} \
     --variable_passages

Please use the following script to test LiT5Score model

python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_mode=LiT5 --context_size=150 --vllm_batched --batch_size=${batchsize} \
    --window_size=100 --variable_passages
xpbowler commented 3 months ago

The error in extract_kwargs() has been fixed and **kwargs has been added for run_llm_batched and run_llm in #128

create_agent() was modified to accommodate MonoT5 in #128 , so there may be some merge conflicts after #128 is merged in.

Other than that, lgtm!

xpbowler commented 3 months ago

I think I'll refactor create_agent() in the near future to make it cleaner

XKTZ commented 3 months ago

Has created a new commit with uses the fix from #128 for rankllm.py's **kwargs and rerank.py's extract_kwargs to avoid conflicts. I guess the create_agent's merge is kind of unavoidable and would need a (easy) merge in the future anyways?

manveertamber commented 3 months ago

@XKTZ I cloned your branch, but am unable to run the command

model=castorini/LiT5-Distill-base
python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_mode=LiT5  --context_size=150 --vllm_batched --batch_size=1 \
     --variable_passages

getting run_rank_llm.py: error: argument --prompt_mode: invalid PromptMode value: 'LiT5'.

Do you know what the issue is?

XKTZ commented 3 months ago

@XKTZ I cloned your branch, but am unable to run the command

model=castorini/LiT5-Distill-base
python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_mode=LiT5  --context_size=150 --vllm_batched --batch_size=1 \
     --variable_passages

getting run_rank_llm.py: error: argument --prompt_mode: invalid PromptMode value: 'LiT5'.

Do you know what the issue is?

Hi @manveertamber could you check if there is a LiT5 = "LiT5" in the file src/rank_llm/rerank/rankllm.py?

egrep -n LiT5 src/rank_llm/rerank/rankllm.py

Myside outputs:

16:    LiT5 = "LiT5"
manveertamber commented 3 months ago

@XKTZ I cloned your branch, but am unable to run the command

model=castorini/LiT5-Distill-base
python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 \
    --retrieval_method=bm25 --prompt_mode=LiT5  --context_size=150 --vllm_batched --batch_size=1 \
     --variable_passages

getting run_rank_llm.py: error: argument --prompt_mode: invalid PromptMode value: 'LiT5'. Do you know what the issue is?

Hi @manveertamber could you check if there is a LiT5 = "LiT5" in the file src/rank_llm/rerank/rankllm.py?

egrep -n LiT5 src/rank_llm/rerank/rankllm.py

Myside outputs:

16:    LiT5 = "LiT5"

Oh I see, I was cloning the wrong branch, thanks!

manveertamber commented 3 months ago

How can I change the sliding window stride? model=castorini/LiT5-Distill-large-v2

python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 --retrieval_method=bm25 --prompt_mode=LiT5  --window_size=20 --step_size=20 --context_size=150 --vllm_batched --batch_size=12 --variable_passages

and

python src/rank_llm/scripts/run_rank_llm.py  --model_path=${model} --top_k_candidates=100 --dataset=dl19 --retrieval_method=bm25 --prompt_mode=LiT5  --window_size=20 --step_size=10 --context_size=150 --vllm_batched --batch_size=12 --variable_passages

both return the same results

ndcg_cut_1              all     0.8062
ndcg_cut_5              all     0.7723
ndcg_cut_10             all     0.7232
XKTZ commented 3 months ago

Hi @manveertamber I have added step_size parameter into the model. Now I believe the command would be working. Sorry about that.

manveertamber commented 3 months ago

Hi @manveertamber I have added step_size parameter into the model. Now I believe the command would be working. Sorry about that.

No problem, things look good from my end now!