Closed jerin-scalers-ai closed 11 months ago
Thanks for the question @jerin-scalers-ai! We used vegeta for sending requests and it was automatically calculating latency and throughput based on performance. In terms of the benchmark results, we were sending different number of requests during 1 second. Now we are conducting longer load tests and observing how RPS decreases with the increase time of test duration
@mariia-georgian Thanks for letting me know.
How is latency and throughput is being measured for Llama 2 7B model inference benchmarking using TGI. Reference