beyond-all-reason / teiserver

Middleware server for online gaming
https://www.beyondallreason.info/
MIT License
51 stars 50 forks source link

[Bug]: FFA rating is not zero sum #434

Open jauggy opened 2 weeks ago

jauggy commented 2 weeks ago

Describe the Bug

Three players playing FFA by themselves can gain rating seemingly out of nowhere. Initially these players should start with 25 skill each or 75 skill among them. If they were to play each other for many games, we would expect the total skill to be roughly 75.

We can find it is not.

Reproduce the bug

https://bar-rts.com/replays/b517af66db3fb90f3858f90d34bb06cd TheAnnihilator 36 nanobot 10 iqwert1717 41

These three players only play each other and nobody else, and yet their total rating is too high. We can find TheAnnihilator's FFA games via: https://www.beyondallreason.info/replays?page=1&limit=24&preset=ffa&hasBots=false&endedNormally=true&players=TheAnnihilator

He only plays his friends.

Here is a game on July: https://www.beyondallreason.info/replays?gameId=8ec88866bfc5ab84c52f8643294bb53a where his rating is 0 and the other ratings are 25, 0.

Since then he has only played with his friends and now they all have higher ratings somehow. The ratings are coming from thin air.

Screenshots

No response

Additional context

See match_rating_lib


    # Build ratings into lists of tuples for the OpenSkill module to handle
    winner_ratings =
      winners
      |> Enum.map(fn membership ->
        rating = rating_lookup[membership.user_id] || BalanceLib.default_rating(rating_type_id)
        {membership.user_id, {rating.skill, rating.uncertainty}}
      end)

    # Now we want to get the best loser to use for the winner's win
    loser_ratings =
      losers
      |> Enum.group_by(
        fn %{team_id: team_id} -> team_id end,
        fn %{user_id: user_id} ->
          rating = rating_lookup[user_id] || BalanceLib.default_rating(rating_type_id)
          {user_id, {rating.skill, rating.uncertainty}}
        end
      )
      |> Map.values()

    # Run the winner calculation
    [win_result | _lose_result] = rate_with_ids([winner_ratings | loser_ratings])
    win_result = Map.new(win_result)

It seems here that the winner's rating is calculated by assuming it was a 1 v 2. I.e. we have the winner's ratings versus all the losers' ratings. Teifion mentions in code "Now we want to get the best loser to use for the winner's win" , but it is not the "best" loser - it is all the losers.

  # If you lose you just count as losing against the winner
    loss_ratings =
      loser_ratings
      |> Enum.map(fn team_ratings ->
        lose_results = rate_with_ids([winner_ratings, team_ratings], as_map: true)

        team_ratings
        |> Enum.map(fn {user_id, _old_rating} ->
          rating_update = lose_results[user_id]

          user_rating = rating_lookup[user_id] || BalanceLib.default_rating(rating_type_id)
          ratiod_rating_update = apply_change_ratio(user_rating, rating_update, opponent_ratio)
          do_update_rating(user_id, match, user_rating, ratiod_rating_update)
        end)
      end)
      |> List.flatten()

However, the loser ratings are calculated as a 1v1 ie. assume the loser lost against the winner.

So if we have three friends playing A,B,C and A wins then

Pretend A vs B+C and calculate A new rating assuming win Pretend B vs A and calculate B new rating assuming loss. Pretend C vs A and calculate C new rating assuming loss.

@geekingfrog is my interpretation of the code correct?

jauggy commented 2 weeks ago

https://www.beyondallreason.info/leaderboards iqwert1717 is currently ranked 28 on the leaderboard and he just plays with his two other friends.

jauggy commented 2 weeks ago

Need to first confirm if my analysis is correct. Then we can think of solutions from there. One avenue to pursue is to ask Vivek how he would handle FFA.

L-e-x-o-n commented 2 weeks ago

FFA ratings are not used for balancing, they just give a rough estimation of how many games someone played (and won). Because they don't affect balance, they can be calculated in any way, something as simple as just counting the wins or something like this where opponent ratings are used as well. FFA itself is very unbalanced from map starting positions and resources to your own starting position and neighbours.