why i use vllm inference deepseek v2 ,speed is low

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

MIT License

3.47k stars 143 forks source link

why i use vllm inference deepseek v2 ,speed is low #24

Open ZzzybEric opened 4 months ago

ZzzybEric commented 4 months ago

i use vllm to inference deepspeed, use flask to deploy model. When the problem enters the model, it always gets stuck for a long time in the processd prompt step，the code i use is your example code

luofuli commented 4 months ago

https://huggingface.co/deepseek-ai/DeepSeek-V2/discussions/1 @ZzzybEric

ran130683 commented 3 months ago

whats your gpu type？