We do currently have a command prompto_create_judge for creating a new experiment file from a template which can be ran for the purposes of evaluation. However, it'd be good to streamline this process and to generalise in the following ways:
Add this as an option to the standard prompto_run_experiment command
This would set off a "chain" of prompto experiment runs: (1) the usual sending of prompts to models and collecting responses, (2) using the obtained responses and a template to generate new prompts to a "judge" LLM and send those to obtain evaluations
Think about generally "chaining" prompto experiments
We do currently have a command
prompto_create_judge
for creating a new experiment file from a template which can be ran for the purposes of evaluation. However, it'd be good to streamline this process and to generalise in the following ways:prompto_run_experiment
command