atomflunder / skillratings

Rust library for popular skill rating algorithms like Elo, Glicko-2, TrueSkill and many more.
https://docs.rs/skillratings/
Apache License 2.0
41 stars 4 forks source link

Perform step 1 calculations in Glicko functions #4

Closed asyncth closed 1 year ago

asyncth commented 1 year ago

This fixes an issue which caused Glicko functions such as glicko and glicko_rating_period not to perform calculations from the step 1 of the original Glicko paper. Those calculations are used to increase the rating deviation at the beginning of every rating period.

Fixes #3.

The results between test_glicko_rating_period and test_glicko are not exactly as similar as the comment on line 523 says they should be, someone probably should check carefully to make sure I didn't mess anything up.

Formatting changes were done by rustfmt.

codecov-commenter commented 1 year ago

Codecov Report

Base: 100.00% // Head: 100.00% // No change to project coverage :thumbsup:

Coverage data is based on head (a651509) compared to base (eeaf0fd). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #4 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 13 13 Lines 7052 7081 +29 ========================================= + Hits 7052 7081 +29 ``` | [Impacted Files](https://codecov.io/gh/atomflunder/skillratings/pull/4?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [src/glicko.rs](https://codecov.io/gh/atomflunder/skillratings/pull/4/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2dsaWNrby5ycw==) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

atomflunder commented 1 year ago

Thanks again for the PR!

If I understood the glicko paper correctly, the pre_deviation as you called it needs to be inserted into the g_value functions in both the glicko and glicko_rating_period functions, instead of the old deviation.

asyncth commented 1 year ago

I believe RD in g function refers to that RD argument that you see in g's definition, you can see opponent's RD RD_j being passed into g function in d^2's and r''s definitions as well.

asyncth commented 1 year ago

Oh wait, nevermind, I was thinking about glicko_rating_period, which doesn't even take player's deviation, sorry about that.

You're right about it being necessary in glicko function, but I don't think it's necessary in glicko_rating_period since both times g_value is called in glicko_rating_period, the opponents' RD is passed in and not player's. I imagine this function would be called for every player in the game when a new rating period starts, so everyone's RD will be adjusted by calculations from step 1 eventually.

asyncth commented 1 year ago

The paper doesn't seem to make it clear whether opponents' RD values are supposed to be pre-step1 or post-step1. If it's the latter, this would be more complicated that I thought. If that's the case, glicko_rating_period probably should calculate pre_deviation for opponents just to use it in those g_value calculations and then throw away, which kind of seems like a hack but should work.

atomflunder commented 1 year ago

If I understand this line in the paper correctly:

Let the pre-period ratings of the m opponents (again from Step 1) be r1, r2, . . . , rm and the ratings deviations be RD1, RD2, . . . , RDm.

This should mean the RDj in the calculations are the pre-period RD of the opponent, and so we do not need to calculate the pre_deviation of the opponent, but only the player. But I am not 100% sure. What do you think?

Edit: To be clear what I mean, I think the "(from Step 1)" is referring the rating only, and not the deviation. But I am honestly kind of confused now haha.
I also looked online to see how other people did it, but resources on the original Glicko are very sparse, most don't bother and just use Glicko-2.

asyncth commented 1 year ago

This is actually one of my first confusions that I had after I first read this paper. After reading the text before the step 1 I was left with an impression that rating and deviation are calculated twice per rating period, once in the beginning and once in the end. After reading both of the steps, I wasn't sure if those calculations are supposed to be at the beginning or at the end of a rating period, so I decided to re-read both of the steps and pay more attentions to phrases like "at the end of previous period" and "pre-period". However, this didn't resolve my confusion and at that time for some reason I thought the information introduced by those phrases was conflicting with each other. After looking at some Glicko-2 implementations I concluded that rating and deviation are probably calculated only once in the rating period, when one rating period ends and another one begins.

I'm still not really sure what pre-period actually means, so I can't really tell if that means we need to calculate pre_deviation of opponents, I'm just going to do what you said. That means passing pre_deviation into g_value in glicko, but not in glicko_rating_period, right?

asyncth commented 1 year ago

Since Glicko-2 doesn't use modified opponent RDs, I'm inclined to believe what you said is correct.

atomflunder commented 1 year ago

That means passing pre_deviation into g_value in glicko, but not in glicko_rating_period, right?

I think we should keep both functions "compatible" so that if you were to pass in a single game into glicko_rating_period it would produce the same result as glicko. And I looked into a Java implementation of Glicko/Glicko-2 and they do it like the way it is right now in your PR, having the "old deviation" in the g-calculations instead of the "pre deviation". So to be honest I think leaving it like this is fine.

Edit: I also looked into the Glicko-Boost paper but only got more confused. Then I looked at Wikipedia and the English Version basically only copies the paper, but the German Version actually distinguishes both RDs, calling the "pre deviation" RD. And there is no mention of RDj, but only RDj. So I think passing in the old value to the g_value function should be correct?

asyncth commented 1 year ago

This should mean the RDj in the calculations are the pre-period RD of the opponent, and so we do not need to calculate the pre_deviation of the opponent, but only the player.

Actually, if this is correct, doesn't that mean that there is no need to calculate pre_deviation in both glicko and glicko_rating_period because in both of them only opponent deviations are passed to g_value?

asyncth commented 1 year ago

So I guess we're leaving this as is? The difference between results on lines 670 and 671 seems to be pretty large, this is normal, right?

atomflunder commented 1 year ago

So I guess we're leaving this as is? The difference between results on lines 670 and 671 seems to be pretty large, this is normal, right?

Might be because the value of 41 could be just unrealistically low, with the proper calculations, and it adjusts upwards.

After reading the paper carefully again, I think this should look correct. What do you think? If you are fine with it I would merge the PR and do a new release.
Sorry that this was much more confusing than expected.

asyncth commented 1 year ago

I think it's correct, but the difference might be large because of default c = 63.2, after all in the paper this number was calculated with an assumptions that typical RD is 50, ratings periods last two months and it takes 5 years for player's rating to become as unreliable as initial rating, while my value 23.75 that I usually use was calculated with assumptions 50, one week and 3 years respectively.

Merge if you're OK with this PR, I'm fine with it.