castorini / rank_llm

Repository for prompt-decoding using LLMs (GPT3.5, GPT4, Vicuna, and Zephyr)
http://rankllm.ai
Apache License 2.0
273 stars 35 forks source link

VLLM Batched Support Init #104

Closed ronakice closed 1 month ago

ronakice commented 3 months ago

Pull Request Checklist

Reference Issue

Please provide the reference to issue this PR is addressing (# followed by the issue number). If there is no associated issue, write "N/A".

ref: #34

Checklist Items

Before submitting your pull request, please review these items:

PR Type

What kind of change does this PR introduce?

ronakice commented 3 months ago

Brings 8-10x speedup for dl19/20, small variations in results need to be examined later.

ronakice commented 3 months ago

VLLM installation requires CUDA 12.1 but besides that it is a simple add.

ronakice commented 3 months ago

Currently vllm is breaking, it will change scores from unbatched results, but we can revert and fix these issues later.

jasper-xian commented 3 months ago

Confirmed on basilisk with a fresh install, running:

CUDA_VISIBLE_DEVICES=0 python src/rank_llm/scripts/run_rank_llm.py  --model_path=castorini/rank_zephyr_7b_v1_full --top_k_candidates=100 --dataset=dl20 --retrieval_method=SPLADE++_EnsembleDistil_ONNX --prompt_mode=rank_GPT  --context_size=4096 --variable_passages --batched

yields:

Evaluating:
Downloading https://search.maven.org/remotecontent?filepath=uk/ac/gla/dcs/terrierteam/jtreceval/0.0.5/jtreceval-0.0.5-jar-with-dependencies.jar to /store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar...
/store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar already exists!
Skipping download.
Trunc /tmp/tmpqsrf4as4
Running command: ['java', '-jar', '/store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.1', '/tmp/tmpqsrf4as4', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_2024-03-20T18:16:19.282427_window_20.txt']
Results:
ndcg_cut_1              all 0.8519
Downloading https://search.maven.org/remotecontent?filepath=uk/ac/gla/dcs/terrierteam/jtreceval/0.0.5/jtreceval-0.0.5-jar-with-dependencies.jar to /store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar...
/store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar already exists!
Skipping download.
Trunc /tmp/tmprox1sahx
Running command: ['java', '-jar', '/store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.5', '/tmp/tmprox1sahx', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_2024-03-20T18:16:19.282427_window_20.txt']
Results:
ndcg_cut_5              all 0.8334
Downloading https://search.maven.org/remotecontent?filepath=uk/ac/gla/dcs/terrierteam/jtreceval/0.0.5/jtreceval-0.0.5-jar-with-dependencies.jar to /store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar...
/store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar already exists!
Skipping download.
Trunc /tmp/tmp1sti62f3
Running command: ['java', '-jar', '/store2/scratch/j5xian/cache/pyserini/eval/jtreceval-0.0.5-jar-with-dependencies.jar', '-m', 'ndcg_cut.10', '/tmp/tmp1sti62f3', 'rerank_results/SPLADE_P_P_ENSEMBLE_DISTIL/rank_zephyr_7b_v1_full_4096_100_rank_GPT_dl20_2024-03-20T18:16:19.282427_window_20.txt']
Results:
ndcg_cut_10             all 0.8126