OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Apache License 2.0
69 stars 17 forks source link

Api server blocked when one request is in-process #137

Open SeanHH86 opened 5 months ago

SeanHH86 commented 5 months ago

Need more test for this issue

SeanHH86 commented 5 months ago

Refer: https://github.com/ray-project/ray/issues/20169 Looks like it need start api-server with multi-replica