bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.93k stars 632 forks source link

Inference Speed comparison #795

Closed that-rahul-guy closed 4 days ago

that-rahul-guy commented 10 months ago

Hello everyone,

I want to discuss why is the difference so big. Am I doing something wrong while serving with OpenLLM? Let me know your thoughts.

Thanks

aarnphm commented 10 months ago

gptq is now supported with vLLM and latest openllm version. You can test it with vLLM as I haven't update the pytorch code path for a while now.

You should see signi improve with vLLM.