OpenCSGs / llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
Apache License 2.0
69 stars 17 forks source link

Refactor streaming #82

Closed depenglee1707 closed 6 months ago

depenglee1707 commented 6 months ago

It's a mass commit, we have:

  1. solve the issue that ray deployment exclusive for generator and normal rest
  2. build in pipelines: default (transformer auto class) and llamacpp support streaming
  3. streaming and predict(non stream) adopt one copy of code, avoid duplicated code
  4. streaming also support ray batch
  5. api consistent