Description of changes:
This PR adds the context precision metric under the context quality evaluation algorithm. This is the first of three evaluation metrics under context quality.
Context Precision is a metric that evaluates whether all of the target output relevant items present in the retrieved contexts are ranked higher or not. Ideally all the relevant context chunks must appear at the top ranks. This metric is computed using the model_input, target_output and the retrieved_contexts, with values ranging between 0 and 1, where higher scores indicate better precision.
Notes:
There are no built in datasets for context quality since the dataset context needs to be retrieved by the RAG system.
The default judge model is currently set to a sample Bedrock model because judge model selection is in progress.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes: This PR adds the context precision metric under the context quality evaluation algorithm. This is the first of three evaluation metrics under context quality.
Context Precision is a metric that evaluates whether all of the target output relevant items present in the retrieved contexts are ranked higher or not. Ideally all the relevant context chunks must appear at the top ranks. This metric is computed using the
model_input
,target_output
and theretrieved_contexts
, with values ranging between 0 and 1, where higher scores indicate better precision.Notes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.