WukLab / preble

Stateful LLM Serving
Apache License 2.0
38 stars 6 forks source link

Add support to sequential requests #58

Closed dongmingli-Ben closed 7 months ago

dongmingli-Ben commented 7 months ago

This PR introduces RequestGroup to replace the old list of requests. A request group is a list of requests, which have some send request pattern, such as request rate, sequential send (can only send after the previous request is complete). The RequestRateManager supports sending requests for multiple request groups concurrently.

For now, there is only two request_type choices, the ExperimentType.default (same behavior as before) and Experiment.sequential (send after complete). The sequential requests and multiple request groups are currently only supported with SGLang server (not simulator) because it requires the completion time of previous requests. However, previous experiments on simulator should not break.

An example use of the sequential requests is in multi_node/benchmarks/multi_exp_configs/e2e_virtualenv_config.py, where the overall request rate is divided into individual request group rate. This is because the Poisson process is additive and can hence maintain the same overall request rate.