We had to add a dependency on transformers in order to perform client-side tokenization when using fmperf with vLLM. Since vLLM recently added support for server-side truncation, this is no longer necessary. We should be able to adapt the load generator to use this new feature, removing that dependency completely.
We had to add a dependency on transformers in order to perform client-side tokenization when using fmperf with vLLM. Since vLLM recently added support for server-side truncation, this is no longer necessary. We should be able to adapt the load generator to use this new feature, removing that dependency completely.