Better rating logs - Githubissues

jauggy commented 1 month ago

Context

The purpose of this PR is to add more information into rating logs (teiserver_game_rating_logs) so that we can run balance algorithms using past information. Currently we store the following information in rating logs:

skill
uncertainty

My split_one_chevs algo also looks at rank, and Marek's algo looks at playtime. https://discord.com/channels/549281623154229250/1262186207673192530/1263880002563346463

When we run a balance algorithm on a past match, it should ideally pull information from the rating logs table to get data at the time of the match.

Improvements

This PR will add the following information to rating logs whenever a match is rated

rank
play_hours
spectator_hours

Testing locally

Run the following task replacing with the username of a user.

mix teiserver.rerate_my_matches <username>

This will re-rate all the matches of a single user.

Open up some sort of database viewer and check that the rating logs have new information (replace 'spadsbot' with the name of the user you want to look at)

SELECT tgrl.value
FROM teiserver_battle_match_memberships tbmm
INNER JOIN teiserver_battle_matches tbm ON tbm.id = tbmm.match_id
    AND user_id = (
        SELECT id
        FROM account_users au
        WHERE name = 'spadsbot'
        )
INNER JOIN teiserver_game_rating_logs tgrl ON tgrl.match_id = tbmm.match_id

Check that the value column has new fields (rank, play_hours, spectator_hours) e.g.

{"rank": 2, "skill": 24.46916062276121, "play_hours": 76, "uncertainty": 7.18185060647911, "rating_value": 17.2873100162821, "skill_change": 0.6665220793110613, "spectator_hours": 0, "uncertainty_change": -0.04568334603121116, "rating_value_change": 0.7122054253422725}

Login to the website and select a match. View the balance tab. Change the dropdown to split_one_chevs and ensure it doesn't crash. It will pull past data from the teiserver_game_rating_logs table if possible

L-e-x-o-n commented 1 month ago

Hmm, I see some issues with this. i think rating logs should be used for rating only, not balance. I also think playtime shouldn't be considered when balancing (but I do understand the issues and the reasons for split one chevs balance mode). Why both rank and play + spec time? Recalculating just team game matches took a day and wasn't over, doing it for all matches would take even more. For one user it would takes less, but I am not sure how useful that would be, wouldn't you need to recalculate all players in a match?

jauggy commented 1 month ago

We wouldn't rerate all matches. The mix task is just there for testing and making sure the rating function hasn't broken. We would not run the mix task in production.

So the use case is say Marek creates a new balance algorithm and it is then deployed to production. Someone uses it and gets a weird result. Or even someone uses split_one_chevs and gets a weird a result. To understand what went wrong I need the data at the time. So I would ask them to give me the url corresponding to the match e.g. https://server4.beyondallreason.info/battle/2297200

then I would go to the balance tab and select the appropriate balance algorithm so I can inspect what happened. This would pull information from past data so it runs exactly how it would have run in the past. My balance algorithm looks at rank. So I need to know what was your rank in the past. We currently don't store that information.

At minimum I want to store rank for my own balance algorithm and examine if I get weird results. I could remove playtime, spectime from the logs since I don't use it.

As an example say I want to check how my balance algo worked in the past I see below. Note that if there was a player who was one chev in the past but is now three chev, I don't get the correct information since their rank was not stored and I just pull their current rank. Ideally the "Based on data at the time" section pulls past data.

L-e-x-o-n commented 1 month ago

We wouldn't rerate all matches. The mix task is just there for testing and making sure the rating function hasn't broken. We would not run the mix task in production.

So the use case is say Marek creates a new balance algorithm and it is then deployed to production. Someone uses it and gets a weird result. Or even someone uses split_one_chevs and gets a weird a result. To understand what went wrong I need the data at the time. So I would ask them to give me the url corresponding to the match e.g. https://server4.beyondallreason.info/battle/2297200

then I would go to the balance tab and select the appropriate balance algorithm so I can inspect what happened. This would pull information from past data so it runs exactly how it would have run in the past. My balance algorithm looks at rank. So I need to know what was your rank in the past. We currently don't store that information.

At minimum I want to store rank for my own balance algorithm and examine if I get weird results. I could remove playtime, spectime from the logs since I don't use it.

As an example say I want to check how my balance algo worked in the past I see below. Note that if there was a player who was one chev in the past but is now three chev, I don't get the correct information since their rank was not stored and I just pull their current rank. Ideally the "Based on data at the time" section pulls past data.

Ah I see. So this would only be useful for future matches?

jauggy commented 1 month ago

Yes just future matches.

beyond-all-reason / teiserver

Better rating logs #367

Context

Improvements

Testing locally