Open rajm3180 opened 3 months ago
Hey @rajm3180, this could be due to how the latency is recorded by the two methods. This benchmarking tool and the improved version measures client-side latency, while the metrics in the Azure dashboard measures the server-side latency (which may not include the time take for data transfer, content safety and some elements of the request authentication & handshake). If you are interested in measuring the latency experienced by users of your application, you should generally rely on the client-side latency measured by this repo.
We are trying to run our benchmarking exercise using the benchmarking tool for gpt4o, but getting different e2e_avg latency reported on the benchmarking tool and the Azure portal and e2e_avg latency reported on the benchmarking tool is atleast twice of that reported on Azure portal.
Command used: python -m benchmark.bench load --temperature 0.0 --shape-profile custom --deployment 'deployment name' --max-tokens 200 --context-tokens 20000 --api-version 2024-02-01 --rate 10 --duration 600 https://genai-stg-westus3-1.openai.azure.com/