atomflunder / skillratings

Rust library for popular skill rating algorithms like Elo, Glicko-2, TrueSkill and many more.
https://docs.rs/skillratings/
Apache License 2.0
41 stars 4 forks source link

Implement Weng-Lin for more than two teams #6

Closed asyncth closed 1 year ago

asyncth commented 1 year ago

This implements Weng-Lin rating system for games with more than two teams, as well as free-for-all games by introducing a new function, weng_lin_multi_team and a new struct MultiTeamOutcome, which is used for specifying what place did a team take (a rank), with lower rank being better for the team.

Closes #5.

codecov-commenter commented 1 year ago

Codecov Report

Base: 100.00% // Head: 99.86% // Decreases project coverage by -0.13% :warning:

Coverage data is based on head (6c4cffd) compared to base (ca3b219). Patch coverage: 97.05% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6 +/- ## =========================================== - Coverage 100.00% 99.86% -0.14% =========================================== Files 13 13 Lines 7147 7381 +234 =========================================== + Hits 7147 7371 +224 - Misses 0 10 +10 ``` | [Impacted Files](https://codecov.io/gh/atomflunder/skillratings/pull/6?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [src/lib.rs](https://codecov.io/gh/atomflunder/skillratings/pull/6/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2xpYi5ycw==) | `93.66% <30.76%> (-6.34%)` | :arrow_down: | | [src/weng\_lin.rs](https://codecov.io/gh/atomflunder/skillratings/pull/6/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL3dlbmdfbGluLnJz) | `99.89% <99.69%> (-0.11%)` | :arrow_down: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

asyncth commented 1 year ago

First commit factors out calculations of some values like small delta and eta into separate functions, previously weng_lin_teams would calculate them in a player-iterating loop, but this wasn't necessary because only team-specific values are used in those calculations (e.g. team rating, team uncertainty, not team player ratings). This is supposed to be a preparation for adding multi-team Weng-Lin function since it will have to calculate those values for every team pair. Do you like the direction this PR is going or do you consider some changes to be unnecessary?

atomflunder commented 1 year ago

Do you like the direction this PR is going or do you consider some changes to be unnecessary?

This looks very good to me. Thank you again for your contributions, this is much appreciated!

Regarding the performance you mentioned, I ran the benchmarks with your changes and the WengLin 4v4 benchmark saw a slight improvement: Screenshot.

asyncth commented 1 year ago

I wonder if the multi-team function should have a cache for c values? As far as I'm aware c_iq and c_qi are the same values, so the loop in the new function would be essentially calculating c value for each team twice. I could use a cache to remember previous c values or simply for any given team i and team q calculate everything for both of those teams immediately and then when loop get to team q and team i skip that iteration. Maybe the cost of calculating c twice isn't that large to do either of those things?

atomflunder commented 1 year ago

Had a look at the paper again, and I agree in thinking c_iq is the same as c_qi. I'm not sure if the performance gain (and the likely loss in readability I'm assuming?) is worth caching the values, but I am fine with either implementation. I would trust you what you think is the better option.

asyncth commented 1 year ago

I wonder if there is any point in converting rank array with ties to an array with d difference between tied positions and next rank like in the paper, e.g [1, 2, 3, 3, 3, 4, 5] to [1, 2, 3, 3, 3, 6, 7] because it seems like the only thing we care about is whether a team gained a larger or a smaller rank than other teams, not the actual number.

atomflunder commented 1 year ago

I wonder if there is any point in converting rank array with ties to an array with d difference between tied positions and next rank like in the paper, e.g [1, 2, 3, 3, 3, 4, 5] to [1, 2, 3, 3, 3, 6, 7] because it seems like the only thing we care about is whether a team gained a larger or a smaller rank than other teams, not the actual number.

Yes, I think the only thing that really matters is the relative placement of the teams compared to each other.

Edit: Also, please ignore the coverage check for now, can always fix that later.

asyncth commented 1 year ago

Had this idea of putting team ratings, team uncertainties and c values for each team in a Vec of tuples, but I didn't do it in the end.

atomflunder commented 1 year ago

Other than the small change above, this looks very good to me! Thank you again, I would be ready to merge it. When bumping the Version I will probably add an FFA example to the Readme, and maybe work on the coverage in the future. But everything important looks to be tested and ready to go.

asyncth commented 1 year ago

Well there isn't a test that tests this function with more than 2 teams, except the one that is used as an example in documentation.

atomflunder commented 1 year ago

Well there isn't a test that tests this function with more than 2 teams, except the one that is used as an example in documentation.

I guess so, but the good thing is that doc-tests are always included in cargo test. We can obviously add another one if you want to. But since you double checked the new test with the old one, I am inclined to believe it is working as intended.

asyncth commented 1 year ago

Well I guess you can merge it now

asyncth commented 1 year ago

I'm not too sure how to fix codecov check though.

atomflunder commented 1 year ago

I'm not too sure how to fix codecov check though.

Don't worry about that, I will look into it with the other stuff like the version bump later today, I have to leave in a bit.
It is most likely complaining about the MultiTeamOutcome struct methods not having any tests.
Thanks again!