Evaluation metrics through feedback function criteria

Zochory commented 6 months ago

Use feedback functions to evaluate and log the quality of LLM app results

The third step is to run feedback functions on the prompt and responses from the app and to log the evaluation results. Note that as a developer you only need to add a few lines of code to start using feedback functions in your apps (see Figure 4(a)). You can also easily add functions tailored to the needs of your application.

Our goal with feedback functions is to programmatically check the app for quality metrics.

The first feedback function checks for language match between the prompt and the response. It’s a useful check since a natural user expectation is that the response is in the same language as the prompt. It is implemented with a call to a HuggingFace API that programmatically checks for language match.

The next feedback function checks how relevant the answer is to the question by using an Open AI LLM that is prompted to produce a relevance score.

Finally, the third feedback function checks how relevant individual chunks retrieved from the vector database are to the question, again using an OpenAI LLM in a similar manner. This is useful because the retrieval step from a vector database may produce chunks that are not relevant to the question and the quality of the final response would be better if these chunks are filtered out before producing the final response.

linear[bot] commented 6 months ago

QRE-34 Evaluation metrics through feedback function criteria

Josephrp commented 6 months ago

this is a comment not an issue

Tonic-AI / DataTonic

Evaluation metrics through feedback function criteria #42