georgian-io / LLM-Finetuning-Toolkit

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.
Apache License 2.0
776 stars 91 forks source link

Latency and Throughput Calculation #58

Closed jerin-scalers-ai closed 11 months ago

jerin-scalers-ai commented 11 months ago

How is latency and throughput is being measured for Llama 2 7B model inference benchmarking using TGI. Reference

mariia-georgian commented 11 months ago

Thanks for the question @jerin-scalers-ai! We used vegeta for sending requests and it was automatically calculating latency and throughput based on performance. In terms of the benchmark results, we were sending different number of requests during 1 second. Now we are conducting longer load tests and observing how RPS decreases with the increase time of test duration

jerin-scalers-ai commented 11 months ago

@mariia-georgian Thanks for letting me know.