Closed andakai closed 3 months ago
Yes, I am trying to do dataparallel using vLLM based on your project. I have read the guide, but still feel confused about how to do this. I wonder if there is any specific doc or example?
You can see this test case right here (test_parallel_llm
) for some example code:
That test case will run a two copies of an LLM (one per GPU on a 2-GPU machine) using Hugging Face Transformers.
If you want to use VLLM
instead of HFTransformers
instead, it's as simple as:
from datadreamer.llms import VLLM, ParallelLLM
llm_1 = VLLM("gpt2", device=0)
llm_2 = VLLM("gpt2", device=1)
parallel_llm = ParallelLLM(llm_1, llm_2)
wow this is so easy to use. It helps me a lot. Thanks for your fantastic work.
No problem, let me know if you need any other help!
I assume you're talking about this: https://github.com/vllm-project/vllm/issues/1237#issuecomment-2017239455
But yes, it does balance the work between multiple instances of the vLLM model.