Closed andy-yang-1 closed 1 year ago
i dont think vllm can load the long-chat model directly due to the flash-attn?
https://github.com/DachengLi1/LongChat/issues/27#issuecomment-1647473244
@Zhuqln vllm currently does not support longchat, so you can only use the Hugging Face Transformers library to load the longchat model. However, support for the longchat model is on the roadmap for vllm. Both vllm and lightllm provide excellent support for long context models based on llama, such as lmsys/vicuna-7b-v1.5-16k. Using vllm and lightllm allows us to utilize more common GPUs (such as T4) to assess the longeval tests of our own models.
This is great @andy-yang-1 Thanks a lot for the support!
This PR introduces support for vllm and lightllm frameworks in the longeval module.
Launch lightllm server:
Launch vllm server:
Run longeval tests:
These frameworks can help you to evaluate longer prompts within less time