support vllm & lightllm

andy-yang-1 commented 1 year ago

This PR introduces support for vllm and lightllm frameworks in the longeval module.

Launch lightllm server:

python -m lightllm.server.api_server --model_dir path/to/weight  --tp 1 --max_total_token_num 12000 --max_req_total_len 20000 --max_req_input_len 15000

Launch vllm server:

python -m vllm.entrypoints.api_server --model path/to/weight

Run longeval tests:

# lightllm
python3 eval.py --model-name-or-path path/to/weight  --task topics  --framework lightllm
# vllm
python3 eval.py --model-name-or-path path/to/weight  --task topics  --framework vllm

These frameworks can help you to evaluate longer prompts within less time

Zhuqln commented 1 year ago

i dont think vllm can load the long-chat model directly due to the flash-attn?
https://github.com/DachengLi1/LongChat/issues/27#issuecomment-1647473244

andy-yang-1 commented 1 year ago

@Zhuqln vllm currently does not support longchat, so you can only use the Hugging Face Transformers library to load the longchat model. However, support for the longchat model is on the roadmap for vllm. Both vllm and lightllm provide excellent support for long context models based on llama, such as lmsys/vicuna-7b-v1.5-16k. Using vllm and lightllm allows us to utilize more common GPUs (such as T4) to assess the longeval tests of our own models.

DachengLi1 commented 1 year ago

This is great @andy-yang-1 Thanks a lot for the support!

DachengLi1 / LongChat

support vllm & lightllm #38