Really needs a "both wrong" "both ok" option

Contextualist / lone-arena

Self-hosted LLM chatbot arena, with yourself as the only judge

MIT License

36 stars 2 forks source link

Thanks for the feedback! The concern is indeed valid. I will need to think about how tie should be handled, though.

I did not implement the "both wrong" "both ok" out of two reasons:

Tie conflicts with the elimination-based tournament process, which allow you to pick the better responses among the responses from each model first, before comparing responses from different models.
Personally, two-way decision feels less mentally taxing as compared to three or four.

I need to think about how to handle tie in elimination matches, or to replace elimination with something else.

Contextualist / lone-arena