Closed James-QiuHaoran closed 1 month ago
Azure released LLM inference traces (2023) from the paper "Splitwise: Efficient generative LLM inference using phase splitting" (ISCA'24), which could serve as a more representative simulation.
Done: https://github.com/James-QiuHaoran/LLM-serving-with-proxy-models/commit/868b9de9808bc19bbc53f4ca27e11048fe89ab95
Azure released LLM inference traces (2023) from the paper "Splitwise: Efficient generative LLM inference using phase splitting" (ISCA'24), which could serve as a more representative simulation.