LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).
Apache License 2.0
296 stars 32 forks source link

Great work! #20

Open irasin opened 3 months ago

irasin commented 3 months ago

Congratulations, great work! I'm wondering if you guys will continue to develop the framework to reach vllm scale. If so, please share some docs/roadmaps about system architecture design to the community, so everyone can help to contribute

irasin commented 3 months ago

BTW,given the same model and sampling_params, I found the generated results of DistServe is different from vllm.

I tested the model meta-llama/Llama-2-7b-hf on the NV A10 GPUs with the sampling params as below

sampling_params = SamplingParams(temperature=0, ignore_eos=True, max_tokens=64)

the prompt is

"Simply put, the theory of relativity states that ",

vllm result:

1) the speed of light is constant in all inertial reference frames, and 2) the laws of physics are the same for all inertial reference frames.
The theory of relativity is a theory of physics that describes the relationship between space and time. It is based on the principle that the speed

DistServe result:

1) the speed of light is the same for all observers in an inertial frame of reference is not a constant.
The speed of light is the same for all observers.
The speed of light is the same for all observers.
The speed of light is the same for all observers

Is there something wrong here?

irasin commented 2 months ago

@interestingLSY @GindaChen