Update split_one_chevs balancemode v2

jauggy commented 3 months ago

Context

I played a game with split_one_chevs on and noticed a few issues. The 1Chevs were not split. This was because Macwhite was recognised by Teiserver as a 2Chev whereas Chobby showed them as a 1Chev. This is likely because Chobby gets a player's rank on login and then never updates it so they get out of sync with TeiServer.

How to resolve?

To resolve this my algorithm will now group both 1 and 2Chevs into the "Noob" bucket. There's also going to be slight change on how we draft players who are in the "Noob" bucket detailed below.

Other Findings

When putting the replay

https://www.beyondallreason.info/replays?gameId=39096966518c40a66b2130b057864aa5

into https://openskill-test.web.app/ I noticed that the library expects Team 1 to win, despite that most humans would bet on Team 2.

Since the library expects Team 1 to win, if my team were to win I would gain a lot of OS i.e. +1. If my team were to lose, I only lose -0.37.

From this we can conclude that if you were to choose from the three "Noobs": _CindersFire, Macwhite, VictoriousDead you probably want to avoid the overrated players, which are likely the ones with highest uncertainty. If an overrated player is on the other team, you stand to win more and lose less, and since they're overrated, you're also more likely to win. Therefore, for those in the "Noobs" category, we probably want to pick those with low uncertainty as they are less likely to be overrated.

split_one_chevs Algorithm v2

Based on these findings, the algorithm will now draft players based on these criteria:

Always pick experienced (3Chev+) players over noobs (1-2Chevs).
Prefer higher OS for experienced players.
Prefer lower uncertainty for noobs.

This draft mimics how a human might draft players with the given visible information in a lobby. It's not super mathematical. Players generally look at chevron level to determine how overrated someone might be. Someone did complain in chat about the lobby balance in the game I played mentioned above. They were obviously eyeballing the chevron levels and assuming those two players were overrated.

Further enchancements

Previously, if the balancer was called it would check the result and if the deviation in ratings between teams was too high, it would rerun the balancer but split all parties into solo players. Now the balance result will have a new field: has_parties?. If this is false we do not need to rerun the balancer again.
Permissions have been reduced for fake users that are not admin. The purpose of this is if you want to test that the Balance tab doesn't appear to normal users, you need to reduce the permissions of fake users.
Anyone with Staff role e.g. Tester, Contributor, etc. can now see the balance tab. The balance tab now has a dropdown allowing you to switch to difference balance modes.
fuzz_multiplier (randomness added to match rating) is now only enabled for Teifion's algos. This makes it easier to debug issues.
If there are no noobs, an alternate balancer (loser_picks) will be called. This is the default algo and it supports parties.

Known Bugs

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

Unit Tests

Run this to run multiple unit tests that relate to balance

  mix test --only balance_test

Local Dev Tests

See comment here for test steps.

Theoretical Testing on past replays

Go here: https://balance-algo-web.web.app/ And enter a past replay. Change algorithm to Split One Chevs v2

jauggy commented 3 months ago

Have updated and now you can see dropdown to change the balancer. Also removed the admin pages (now redirects) to be consistent with the previous PR made by Perfi to remove them.

jauggy commented 3 months ago

Sample video of testing balancer tab in Integration server: https://www.youtube.com/watch?v=KUqzvU6GBug

Note that the balancer tab only appears for rated games. Also the chevron level of players is based on current data always (since we don't store history of this).

jauggy commented 3 months ago

Local Dev Tests

You must rerun the fake data task. This is because if you ran it previously, the fake users will have too high permissions. I modified the fakedata task so fake users will have normal permissions. Also the task will now also add fake playtime data.

mix teiserver.fakedata

Launch the website

mix phx.server

Login to the website using root@localhost Now go to Admin > Matches > Select a Match > Balance Tab. You will see the logs for the loser_picks algo.

There is a dropdown with the label "Balance Algorithm" near the top. Change this to split_one_chevs You will now see the logs for split_one_chevs.

Testing permissions of normal users

Login as root admin and find a user. Copy their email which should be a guid. Relogin as that user using that email and the password is password. Check that they cannot see the balance tab.
Relogin as admin and then give that user a contributor/tester role. Relogin as the user and now they should have access.

jauggy commented 3 months ago

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

I thought rank was only calculated on login, when/where else does Teiserver calculate rank?

To be honest this bug really baffled me. I still have no idea why they are out of sync. From my searching it only gets calculated on login.

L-e-x-o-n commented 3 months ago

Teiserver doesn't know the rank shown to the user in Chobby. Chobby gets the rank on login and then never updates. Teiserver, therefore, may classify a user as 2Chev but they might be shown as 1Chev in Chobby.

I thought rank was only calculated on login, when/where else does Teiserver calculate rank?

To be honest this bug really baffled me. I still have no idea why they are out of sync. From my searching it only gets calculated on login.

That was my understanding as well. I checked again, not sure what we are missing...

jauggy commented 3 months ago

Currently at least there is an issue for it: https://github.com/beyond-all-reason/teiserver/issues/332 So that the bug is at least recorded.

jauggy commented 3 months ago

@L-e-x-o-n I have updated the PR now with the following changes:

fakedata task now also calls task to add playtime stats
balancer renamed to be more consistent with other usage

beyond-all-reason / teiserver