Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
What is the outcome that you are trying to reach?
RayLLM is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on Ray Serve. It will allow us to provide an OOTB RESTful API for LLMs sourced from HuggingFace (including custom models).
Describe the solution you would like
Update JARK stack and other RayServe examples to use RayLLM.
Look into vLLM under the hood for autoscaling, continuous batching basically efficiently scaling LLM inference. Use https://github.com/ray-project/llmperf for benchmarking.
Community Note
What is the outcome that you are trying to reach?
RayLLM is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on Ray Serve. It will allow us to provide an OOTB RESTful API for LLMs sourced from HuggingFace (including custom models).
Describe the solution you would like
Update JARK stack and other RayServe examples to use RayLLM.