elo for repair - Githubissues

I believe so (not an export on elo system tho)

We would need:

A couple of magic numbers (starting elo, and scaling parameters for the influence of each match and computing the expected outcome)
Define a winning criteria (e.g., given 10 non-deterministic answers, which generates more correct patches)
Simulate matches until convergence

When a new LLM is added, we run the simulation again (on top of the existing results potentially).

ASSERT-KTH / repairbench