fmperf-project / fmperf

Cloud Native Benchmarking of Foundation Models
Apache License 2.0
21 stars 10 forks source link

Support benchmarking of vLLM advanced features #28

Closed jvlunteren closed 4 months ago

jvlunteren commented 4 months ago

Supports issue #27.

This PR enables fmperf to exploit the usage statistics that vLLM since recently can include in every single streaming response, for correctly determining the token count when chunked prefill or speculative decoding have been enabled.