atomflunder / skillratings

Rust library for popular skill rating algorithms like Elo, Glicko-2, TrueSkill and many more.
https://docs.rs/skillratings/
Apache License 2.0
41 stars 4 forks source link

Doc comment of `WengLinConfig::beta` is probably incorrect #7

Closed asyncth closed 1 year ago

asyncth commented 1 year ago
use skillratings::weng_lin::{self, WengLinConfig, WengLinRating};

fn main() {
    let config = WengLinConfig::default();

    let player = WengLinRating {
        rating: 25.0 + config.beta,
        uncertainty: f64::EPSILON,
    };
    let player_2 = WengLinRating {
        rating: 25.0,
        uncertainty: f64::EPSILON,
    };

    println!(
        "{}",
        weng_lin::expected_score(&player, &player_2, &config).0
    );
}

Output:

0.6697615493266569

That's not 80%

atomflunder commented 1 year ago

Hey, thanks for the report!

The original paper states that the default beta value of 25/6 was chosen, because it followed the TrueSkill algorithm:

Below we discuss initial values and parameters. Generally we follow the setting in Herbrich et al. (2007). [...] The additional variance of performance β² = (25/6)².

And in the "Math behind TrueSkill" paper, which I mainly followed for the TrueSkill algorithm, it states the following:

In (2), TrueSkill co-inventor Ralf Herbrich gives a good definition of β as defining the length of the “skill chain.” If a game has a wide range of skills, then β will tell you how wide each link is in the skill chain. This can also be thought of how wide (in terms of skill points) each skill class.
Similiarly, β tells us the number of skill points a person must have above someone else to identify an 80% probability of win against that person.
For example, if β is 4 then a player Alice with a skill of “30” will tend to win against Bob who has a skill of “26” approximately 80% of the time.

So that is what I went with in the explanation without double checking it. However, since this is clearly less we should probably edit it in the doc comment.
I also double checked this with the TrueSkill algorithm, and it yields ~76% there, also not quite 80%.

asyncth commented 1 year ago

I found this out only because I was trying to pick a beta value for TrueSkill that matches Glicko's scale, then decided to check if that works with Weng-Lin as well lol