Closed dongmingli-Ben closed 6 months ago
do you support running both sglang and vllm?
do you support running both sglang and vllm?
Right now it does not support running both yet because some of the args for sglang do not work with vllm and vice versa. One way to support both systems is to let sglang and vllm runtime to ignore args not specific to them.
@vikranth22446 Now both sglang runtime and vllm runtime will ignore arguments irrelevant to them. With this, now I can run sglang on GPU 0 and vllm on GPU 1. An example of this is in multi_node/benchmarks/bench_data_parallel_routing.py
.
LGTM for now. Next step for cleaning this would be to maybe put the config directly inside the gpu config wrapper(extend for each), but I'll merge
This PR adds vllm support via SSH connection. A VLLMRuntime is added and can be loaded like other runtimes via MultiNodeLoader. Several things are added for vllm (while none breaks current code):
vllm_config
field is added to GPUConfig to indicate the port for vllm server; defaults to None, meaning not using vllmenable_prefix_caching
. This argument is not allowed when using sglangAn example of using the vllm runtime is in
multi_node/benchmarks/bench_data_parallel_routing.py
. The newly added test filemulti_node/test_runtime.py
also has examples of using vllm runtime.About performance, for mistralai/Mistral-7B-v0.1, vllm is sometimes faster than sglang in terms of total time for all requests.