glandfried / TrueSkillThroughTime

TrueSkill Through Time: the Julia, Python and R packages.
25 stars 3 forks source link

Serializing at scale #3

Open zass30 opened 2 years ago

zass30 commented 2 years ago

When a sequence of events is processed through the History class, the Gaussians of the Player objects are not themselves updated. Rather, it appears that one has to query the History class learning_curves() method to get the updated Gaussians.

If we are running games at scale (tens of millions of games), we will eventually run out of memory using the History Class. There should be a function that updates the original Player objects with their new Gaussians, so that after running some number of games, the updated Gaussians (and any other info, such as time) can be stored in new Player objects, and a new History class can be greated for upcoming games.

In other words, I would like a serialized object where we can recompute player rankings from a loaded state with each player's mu, sigma, and other parameters saved. This object's size should not grow as a function of games played -- it should be the same size whether ten games have been played or ten million games have been played.

glandfried commented 2 years ago

Given the model TrueSkill Through Time, we need all the history of events to update the last estimates, not just the last posterior. Currently, millions of games can be processed in any low end computer using Julia or Python. One of the most important pending tasks is to improve the implementation so that tens of millions of games can be analyzed.

zass30 commented 2 years ago

Perhaps as a workaround, every N million games we can create new Player objects with the last known mu and sigma, and then start a new History from scratch? It's not perfect as it loses the entire history, but it gives a placeholder of best guess estimates for a new set of N million games.

The biggest challenge I see with this is that we lose the amount of time from the last game. Perhaps this could be be solved by not just creating the new Player objects, but creating the Players as a time variable for how long ago their last game played was. Then, when creating a new History, the first Game played can have this time interval taken into account?

glandfried commented 2 years ago

We must solve this point properly. In the meantime, we can use the workaround proposed by you, adding the amount of uncertainty from the last game before defining the new priors that will be used in the new History() instance.