Open mikeldking opened 5 months ago
Implement a pairwise evaluator that leverages LLM as a judge to judge two generations against each-other. In the case of experiments this would assume to perform judgement against the expected>
https://docs.llamaindex.ai/en/stable/examples/evaluation/pairwise_eval/
Note that there should be a parameter for consensus. E.g. force the LLM to judge the answer flipped and see what it would say.
Hi, @mikeldking. I'm Dosu, and I'm helping the Arize Phoenix team manage their backlog. I'm marking this issue as stale.
Issue Summary:
Next Steps:
Thank you for your understanding and contribution!
Implement a pairwise evaluator that leverages LLM as a judge to judge two generations against each-other. In the case of experiments this would assume to perform judgement against the expected>
https://docs.llamaindex.ai/en/stable/examples/evaluation/pairwise_eval/
Note that there should be a parameter for consensus. E.g. force the LLM to judge the answer flipped and see what it would say.