SubstrateChess / pallet-chess

The Unlicense
11 stars 6 forks source link

Elo Ranking #2

Closed bernardoaraujor closed 1 year ago

bernardoaraujor commented 1 year ago

Implement an Elo Ranking system based on match outcomes.

References:

  1. https://www.chess.com/terms/elo-rating-chess
  2. https://en.wikipedia.org/wiki/Elo_rating_system
  3. https://metinmediamath.wordpress.com/2013/11/27/how-to-calculate-the-elo-rating-including-example/

From ref. 3:

Suppose two players or teams with the current ratings r(1) and r(2) compete in a match. What will be their updated rating r'(1) and r'(2) after said match? Let’s do this step by step, first in general terms and then in a numerical example.

The first step is to compute the transformed rating for each player or team:

  • R(1) = 10**(r(1)/400)
  • R(2) = 10**(r(2)/400)

This is just to simplify the further computations. In the second step we calculate the expected score for each player:

  • E(1) = R(1) / (R(1) + R(2))
  • E(2) = R(2) / (R(1) + R(2))

Now we wait for the match to finish and set the actual score in the third step:

  • S(1) = 1 if player 1 wins / 0.5 if draw / 0 if player 2 wins
  • S(2) = 0 if player 1 wins / 0.5 if draw / 1 if player 2 wins

Now we can put it all together and in a fourth step find out the updated Elo-rating for each player:

  • r'(1) = r(1) + K * (S(1) – E(1))
  • r'(2) = r(2) + K * (S(2) – E(2))

What about the K that suddenly popped up? This is called the K-factor and basically a measure of how strong a match will impact the players’ ratings. If you set K too low the ratings will hardly be impacted by the matches and very stable ratings (too stable) will occur. On the other hand, if you set it too high, the ratings will fluctuate wildly according to the current performance. Different organizations use different K-factors, there’s no universally accepted value. In chess the ICC uses a value of K = 32. Other approaches can be found here.

Now let’s do an example. We’ll adopt the value K = 32. Two chess players rated r(1) = 2400 and r(2) = 2000 (so player 2 is the underdog) compete in a single match. What will be the resulting rating if player 1 wins as expected? Let’s see. Here are the transformed ratings:

  • R(1) = 10**(2400/400) = 1.000.000
  • R(2) = 10**(2000/400) = 100.000

Onto the expected score for each player:

  • E(1) = 1.000.000 / (1.000.000 + 100.000) = 0.91
  • E(2) = 100.000 / (1.000.000 + 100.000) = 0.09

This is the actual score if player 1 wins:

  • S(1) = 1
  • S(2) = 0

Now we find out the updated Elo-rating:

  • r'(1) = 2400 + 32 * (1 – 0.91) = 2403
  • r'(2) = 2000 + 32 * (0 – 0.09) = 1997

Wow, that’s boring, the rating hardly changed. But this makes sense. By player 1 winning, both players performed according to their ratings. So no need for any significant changes.

—————————————–

What if player 2 won instead? Well, we don’t need to recalculate the transformed ratings and expected scores, these remain the same. However, this is now the actual score for the match:

  • S(1) = 0
  • S(2) = 1

Now onto the updated Elo-rating:

  • r'(1) = 2400 + 32 * (0 – 0.91) = 2371
  • r'(2) = 2000 + 32 * (1 – 0.09) = 2029

This time the rating changed much more strongly.

Moliholy commented 1 year ago

I created a PR in my own fork to address this issue: https://github.com/SubstrateChess/pallet-chess/pull/28