The code uses azure.ai.generative.evaluate import evaluate to compute RAG metrics (gpt_coherence, gpt_relevance, and gpt_groundedness).
One important metric of RAG is the latency metric, depending on the model, retrieval, and raking technique used, this may impact the latency.
I've checked the evaluate function, and can't see latency as a metric.
I tried to just compute the time needed to run the evaluation and divide it by the number of questions, but the evaluation also computes GPT metrics which takes a lot of time, so the latency results are far from accurate. The idea is the have a mean latency for each experiment as a metric.
Hi there,
The code uses
azure.ai.generative.evaluate import evaluate
to compute RAG metrics (gpt_coherence, gpt_relevance, and gpt_groundedness).One important metric of RAG is the latency metric, depending on the model, retrieval, and raking technique used, this may impact the latency. I've checked the evaluate function, and can't see latency as a metric.
I tried to just compute the time needed to run the evaluation and divide it by the number of questions, but the evaluation also computes GPT metrics which takes a lot of time, so the latency results are far from accurate. The idea is the have a mean latency for each experiment as a metric.
Any tip on how to compute it?