This PR introduces RequestGroup to replace the old list of requests. A request group is a list of requests, which have some send request pattern, such as request rate, sequential send (can only send after the previous request is complete). The RequestRateManager supports sending requests for multiple request groups concurrently.
For now, there is only two request_type choices, the ExperimentType.default (same behavior as before) and Experiment.sequential (send after complete). The sequential requests and multiple request groups are currently only supported with SGLang server (not simulator) because it requires the completion time of previous requests. However, previous experiments on simulator should not break.
An example use of the sequential requests is in multi_node/benchmarks/multi_exp_configs/e2e_virtualenv_config.py, where the overall request rate is divided into individual request group rate. This is because the Poisson process is additive and can hence maintain the same overall request rate.
This PR introduces
RequestGroup
to replace the old list of requests. A request group is a list of requests, which have some send request pattern, such as request rate, sequential send (can only send after the previous request is complete). TheRequestRateManager
supports sending requests for multiple request groups concurrently.For now, there is only two
request_type
choices, theExperimentType.default
(same behavior as before) andExperiment.sequential
(send after complete). The sequential requests and multiple request groups are currently only supported with SGLang server (not simulator) because it requires the completion time of previous requests. However, previous experiments on simulator should not break.An example use of the sequential requests is in
multi_node/benchmarks/multi_exp_configs/e2e_virtualenv_config.py
, where the overall request rate is divided into individual request group rate. This is because the Poisson process is additive and can hence maintain the same overall request rate.