fmperf crashes when vLLM server is started using --enable-chunked-prefill option

fmperf-project / fmperf

Cloud Native Benchmarking of Foundation Models

Apache License 2.0

21 stars 10 forks source link

fmperf crashes when vLLM server is started using --enable-chunked-prefill option #13

Closed jvlunteren closed 5 months ago

jvlunteren commented 5 months ago

This is caused by "empty output tokens" that are generated when a prompt is split into smaller chunks for prefill processing. I will submit a PR that resolves this issue.