Refactor streaming - Githubissues

OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

Apache License 2.0

69 stars 17 forks source link

Refactor streaming #82

Closed depenglee1707 closed 6 months ago

depenglee1707 commented 6 months ago

It's a mass commit, we have:

solve the issue that ray deployment exclusive for generator and normal rest
build in pipelines: default (transformer auto class) and llamacpp support streaming
streaming and predict(non stream) adopt one copy of code, avoid duplicated code
streaming also support ray batch
api consistent