Azure-Samples / ai-rag-chat-evaluator

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI
MIT License
209 stars 75 forks source link

How to compute mean RAG latency? #44

Closed cpatrickalves closed 7 months ago

cpatrickalves commented 8 months ago

Hi there,

The code uses azure.ai.generative.evaluate import evaluate to compute RAG metrics (gpt_coherence, gpt_relevance, and gpt_groundedness).

One important metric of RAG is the latency metric, depending on the model, retrieval, and raking technique used, this may impact the latency. I've checked the evaluate function, and can't see latency as a metric.

I tried to just compute the time needed to run the evaluation and divide it by the number of questions, but the evaluation also computes GPT metrics which takes a lot of time, so the latency results are far from accurate. The idea is the have a mean latency for each experiment as a metric.

Any tip on how to compute it?

pamelafox commented 8 months ago

It's possible in latest SDK, I'll send a PR shortly that includes latency!

cpatrickalves commented 8 months ago

Awesome! Thanks!!!

pamelafox commented 8 months ago

See https://github.com/Azure-Samples/ai-rag-chat-evaluator/pull/45

pamelafox commented 7 months ago

Now in main, closing