Closed jvlunteren closed 4 months ago
Supports issue #27.
This PR enables fmperf to exploit the usage statistics that vLLM since recently can include in every single streaming response, for correctly determining the token count when chunked prefill or speculative decoding have been enabled.
Supports issue #27.
This PR enables fmperf to exploit the usage statistics that vLLM since recently can include in every single streaming response, for correctly determining the token count when chunked prefill or speculative decoding have been enabled.