We have a script which lets us run promptfoo to compare prompts. This is good. However, we need to write the actual tests which are done on the prompt output.
We want our model to produce a paraphrased claim, alongside the direct quote from the text that this was paraphrased from.
We should write a test which checks if the quotes pulled out by the model actually exist in the text.
For the test we'll need a python script, which matches a certain format. This is from the docs:
This file will be called with an output string and an AssertContext object (see above). It expects that either a bool (pass/fail), float (score), or GradingResult will be returned.
Requirements
Write a python script with a get_assert method, with the inputs/outputs specified in the documentation (see notes).
Write python tests to verify that the script correctly identifies the proportion of claims where the quote text appears in the text chunk.
Notes and additional information
Example of a script
from typing import Dict, TypedDict, Union
def get_assert(output: str, context) -> Union[bool, float, Dict[str, Any]]:
print('Prompt:', context['prompt'])
print('Vars', context['vars']['topic'])
# This return is an example GradingResult dict
return {
'pass': True,
'score': 0.6,
'reason': 'Looks good to me',
}
Overview
We have a script which lets us run promptfoo to compare prompts. This is good. However, we need to write the actual tests which are done on the prompt output.
We want our model to produce a paraphrased claim, alongside the direct quote from the text that this was paraphrased from.
We should write a test which checks if the quotes pulled out by the model actually exist in the text.
For the test we'll need a python script, which matches a certain format. This is from the docs:
Requirements
get_assert
method, with the inputs/outputs specified in the documentation (see notes).Notes and additional information
Example of a script