Arize-ai / phoenix

AI Observability & Evaluation
https://docs.arize.com/phoenix
Other
4.05k stars 299 forks source link

[experiments] pairwise evaluator #3738

Open mikeldking opened 5 months ago

mikeldking commented 5 months ago

Implement a pairwise evaluator that leverages LLM as a judge to judge two generations against each-other. In the case of experiments this would assume to perform judgement against the expected>

https://docs.llamaindex.ai/en/stable/examples/evaluation/pairwise_eval/

Note that there should be a parameter for consensus. E.g. force the LLM to judge the answer flipped and see what it would say.

dosubot[bot] commented 5 days ago

Hi, @mikeldking. I'm Dosu, and I'm helping the Arize Phoenix team manage their backlog. I'm marking this issue as stale.

Issue Summary:

Next Steps:

Thank you for your understanding and contribution!