Add a Skill Factor (ß) tuning feature

How, that is the question. The Skill Factor is described in the literature as the difference in µ values that would map to approximately 80% chance of the higher µ holder of winning.

Meaning: The Trueskill authors argue that this reflects the balance of skill and luck in a game.

If ß is high more luck is modeled, and if it is low, more skill.The default value is one sixth of the initial mean, or 25 ÷ 6, which is 41/6 or 4.166... if you prefer.

This again is completely arbitrary but a value that worked empirically for the TrueSkill developers in their initial environment (the Xbox matchmaking system).

It turns out that they did some subsequent research comparing actual game outcomes against Trueskill predictions to tune ß. They arrived at the following values as guidelines for Xbox games:

3.33 for Golf (a game of almost pure skill)
5.00 for Car racing
20.8 for UNO (a game of chance)

The challenge we have is to do the same. To use the data already collected to tune ß.

Conceptually this not difficult and amounts to asking:

For a given game, what value of ß would come closest to predicting the actual recorded results.

The devil is in the detail. Namely how to find that ß and what part of the result tree is used. Conceptually all of it, or maybe only the latest ratings for all players in the game (as opposed to all the ratings they held historically, i.e. the outcome of each game).

Conceptually the site already has implemented TrueSkill predictions and presents them on request for the player ratings as they were before this game session was played/recorded and after. These are currently displayed on the leaderboards page as two selectable options Show TrueSkill Predictions and Show Post-Session TrueSkill Predictions. These report both a confidence and a prediction accuracy (the calculation of which we recorded in the Literature directory).

For any session we could ask if there's a value of ß that would maximise the prediction accuracy of the ratings prior. And the general tuning problem is one of asking that for all the recorded sessions in that game ... is there a ß that maximises the net predictive power of the model ...

Given TrueSkill is a numeric model this lends itself to possible analysis with standard maximisation/minimisation approaches. The first step would task whether A = f(ß) (Accuracy of prediction as a function of Skill factor) is a well formed function (that is, exhibits a maximum).

It's wholly conceivable it's not, and that it has a plurality of local maxima.

This is abasic research question in the first instance.

bernd-wechner / CoGs

Add a Skill Factor (ß) tuning feature #21