feat: add context precision for context quality eval algo

oyangz commented 2 weeks ago

Issue #, if available:

Description of changes: This PR adds the context precision metric under the context quality evaluation algorithm. This is the first of three evaluation metrics under context quality.

Context Precision is a metric that evaluates whether all of the target output relevant items present in the retrieved contexts are ranked higher or not. Ideally all the relevant context chunks must appear at the top ranks. This metric is computed using the model_input, target_output and the retrieved_contexts, with values ranging between 0 and 1, where higher scores indicate better precision.

Notes:

There are no built in datasets for context quality since the dataset context needs to be retrieved by the RAG system.
The default judge model is currently set to a sample Bedrock model because judge model selection is in progress.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

review-notebook-app[bot] commented 2 weeks ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

oyangz commented 1 week ago

Will address default prompt template related changes separately.

aws / fmeval

feat: add context precision for context quality eval algo #289