awslabs / data-on-eks

DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
https://awslabs.github.io/data-on-eks/
Apache License 2.0
620 stars 209 forks source link

[Inference]: RayLLM pattern for LLMs #337

Open askulkarni2 opened 1 year ago

askulkarni2 commented 1 year ago

Community Note

What is the outcome that you are trying to reach?

RayLLM is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on Ray Serve. It will allow us to provide an OOTB RESTful API for LLMs sourced from HuggingFace (including custom models).

Describe the solution you would like

Update JARK stack and other RayServe examples to use RayLLM.

askulkarni2 commented 7 months ago

Look into vLLM under the hood for autoscaling, continuous batching basically efficiently scaling LLM inference. Use https://github.com/ray-project/llmperf for benchmarking.