James-QiuHaoran / LLM-serving-with-proxy-models

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Apache License 2.0
14 stars 5 forks source link