James-QiuHaoran / LLM-serving-with-proxy-models

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Apache License 2.0
14 stars 5 forks source link

Add LLM inference trace driven simulation #8

Closed James-QiuHaoran closed 1 month ago

James-QiuHaoran commented 1 month ago

Azure released LLM inference traces (2023) from the paper "Splitwise: Efficient generative LLM inference using phase splitting" (ISCA'24), which could serve as a more representative simulation.

James-QiuHaoran commented 1 month ago

Done: https://github.com/James-QiuHaoran/LLM-serving-with-proxy-models/commit/868b9de9808bc19bbc53f4ca27e11048fe89ab95