EbTech / Elo-MMR

Skill estimation systems for multiplayer competitions
https://arxiv.org/abs/2101.00400
MIT License
175 stars 23 forks source link

Integrating the library #14

Open Wulfheart opened 2 years ago

Wulfheart commented 2 years ago

I would like to integrate the algorithm into an existing Web API. Therefore I thought I could send the data which is currently in a subfolder of the cache directory directly as JSON. Is there any way to integrate it quite seamlessly as a library?

EbTech commented 2 years ago

Sure, you can make an API that accepts JSON files. What exactly do you want this library to do?

If you want to implement it in Rust, the worldrank-api directory might give you ideas. It's still under construction, but it should work. It doesn't directly use the raw data under cache/, but rather, loads the ratings history under data/. These data files are not in the repository, but must be generated by running multi-skill. The serde library automatically handles the conversion between in-memory structs and JSON files.

Wulfheart commented 2 years ago

I currently have a PHP application which is able to generate the player rankings for some contests as defined in the cache directory. Now I would like to integrate the ranking calculation as easy as possible into the current system. However, I think the simplest way to achieve this is by building a release, deploy it to the system and run it from there. You don't happen to have a reference implementation of mmr in Go, Python or PHP (generally any other language than rust. I am still having a hard time to write code productively in rust 😢 )?

EbTech commented 2 years ago

Oh if you just want MMR itself, it's not a lot of code: https://github.com/EbTech/Elo-MMR/blob/master/multi-skill/src/systems/simple_elo_mmr.rs

Would you be able to translate this to your preferred language? Feel free to ask questions.

Wulfheart commented 2 years ago

I think I should be able to do this. However, is there a difference between EloMMR and SimpleEloMMR? How much do they differ?

EbTech commented 2 years ago

They're the same. The bigger Elo-MMR just has a bunch of (for most purposes) unnecessary features that can be turned on, like approximations that would make it run even faster.

Wulfheart commented 2 years ago

I decided to use it as a web-api for now as it is simpler to use it that way for now.

Are there any specific values for drift_per_second needed? After a certain threshold (~ 0.5/(246060)) inactive players are preferred.

EbTech commented 2 years ago

Inactive players get a boost when the drift is above 0.5 per day? That's unexpected.

Wulfheart commented 2 years ago

I made a repo to reproduce it (including steps) with the dataset I am using here.

Below you can see the first ranked player, the display ranking, the last contest and the drift per day. For drifts between 0 and 0.25 (the threshold may be higher) it seems reasonable but for higher than 0.5 it seems off as a player which hasn't been active for some years shouldn't be rated the top player.

{'player': 'Sheath', 'display_ranking': 1910} 2018-02-04 14:40:59 0
======
{'player': 'Sheath', 'display_ranking': 1909} 2018-02-04 14:40:59 0.01
======
{'player': 'Sheath', 'display_ranking': 1907} 2018-02-04 14:40:59 0.05
======
{'player': 'Sheath', 'display_ranking': 1903} 2018-02-04 14:40:59 0.1
======
{'player': 'Sheath', 'display_ranking': 1888} 2018-02-04 14:40:59 0.25
======
{'player': 'johnny_low', 'display_ranking': 1929} 2012-07-03 10:36:01 0.5
======
{'player': 'johnny_low', 'display_ranking': 2024} 2012-07-03 10:36:01 1
======
{'player': 'johnny_low', 'display_ranking': 2088} 2012-07-03 10:36:01 1.4285714285714286
======

Do you have any idea? Did I do something wrong in the rust api or is my methodology wrong?

Thank you in advance.

EbTech commented 2 years ago

Oh hey, I just remembered something that should explain your situation (sorry I've been a bit ill). The decay feature, which I think was pioneered by Glicko, comes with a couple caveats:

If you disagree with this uncertainty-based approach altogether, you can hack in a rating penalty instead of messing with sigma.

Wulfheart commented 2 years ago

Thanks for your thorough explanation. I don't disagree completely with the uncertainty based approach I don't think it is viable in my situation because I want the rating to decay automatically after a time t even if the player hasn't participated in a contest.

Currently I use the experiment.eval() to to get the ranking. Where can I hack in the rating penalty?

EbTech commented 2 years ago

You can use the uncertainty-based decay too, using the same formula to update the rating for any given time. It's just that the current implementation doesn't have a way to ask for an updated rating at a particular time (I may change the design to support that later). If you want to add that yourself, I could imagine several places and haven't thought carefully about which is best. Whenever you retrieve the ratings, if you also retrieve the player's last update time, you can compare that to the current time and compute an adjusted rating that way.

stephankokkas commented 2 years ago

Oh hey, I just remembered something that should explain your situation (sorry I've been a bit ill). The decay feature, which I think was pioneered by Glicko, comes with a couple caveats:

  • In the current implementation, rating updates are postponed until the user competes again, so you won't see the decay right away.
  • Since the decay works by increasing a player's sigma, it actually gives future contests (after the period of inactivity) higher weight. The rationale for this is that a player who takes a break from the system, much like a newcomer, has unknown skill. Their decayed rating is the result of the lower bound on their skill having decreased, but behind the scenes the upper bound will also have increased. In practice, you'll probably want to cap sigma to the starting value of 350. You might also decide to decay a player's rating back to the default of 1500, so that a long-inactive player would asymptotically revert to newcomer status. A sample implementation of this is provided in https://github.com/EbTech/Elo-MMR/blob/master/multi-skill/src/systems/common/mod.rs#L26 which might eventually make it into the core update code.

If you disagree with this uncertainty-based approach altogether, you can hack in a rating penalty instead of messing with sigma.

I have a tendency to dislike this approach - only because if a player is absent from competition for a while it is unfair to assume that their skill will decrease. A possible reason for why a competitor is not competing may be to practise a particular skill or improve in an aspect of the game / task. I am still thinking of a solution to this problem.

EbTech commented 2 years ago

@stephankokkas in that case, wouldn't the uncertainty-based approach be more suitable? That way, the display rating is temporarily lowered, but is quickly raised when the player returns. Maybe you specifically dislike mu returning to 1500? In that case, you might leave mu untouched, but gradually increase sigma^2 towards an asymptotic cap such as 350. If the community gravitates towards a particular approach, I might make that the default; for now, there appears to be room for competing philosophies.

Wulfheart commented 2 years ago

I have a tendency to dislike this approach - only because if a player is absent from competition for a while it is unfair to assume that their skill will decrease. A possible reason for why a competitor is not competing may be to practise a particular skill or improve in an aspect of the game / task. I am still thinking of a solution to this problem.

@stephankokkas I am unable to grasp the concept @EbTech suggests. However, lowering the score manually incentivizes users to play more often.

Wulfheart commented 2 years ago

@stephankokkas have you come up with a better solution?

stephankokkas commented 2 years ago

Without changing the code, yes. I was able to acquire training data of my competitors and incorporated it that way into the rating of each player.

Wulfheart commented 2 years ago

I don’t understand it completely. How can I integrate it? Is there a rating decay?

where-is-paul commented 2 years ago

To expand more on Aram's solution:

The "display rating" in our system is calculated as true_rating - 3 * uncertainty -- in statistical terms the true rating is the average of some distribution and the uncertainty is the standard deviation.

Aram is suggesting steadily increasing the "uncertainty" over time. On the front-end, this looks like the rating is gradually going down over time.

Increasing the uncertainty means that the "true rating" in our system stays the same, but the system is less certain about the spread around the true rating. When users participate in a contest, their performance uncertainty is decreased by the system because we get more information about them. This means that if their skill has not decayed, then they will return to their original rating quicky.

Hope this helps, -- Paul

stephankokkas commented 2 years ago

@where-is-paul would you be able to explain how to enable this feature when rating? Is it a tag that can be used in the command line? Or perhaps something that needs adjusting in the source code?

Thanks

Wulfheart commented 2 years ago

Some commits happened recently on this repository but I don’t know what exactly they do. At least some APIs have changed. For example some in the SimpleEloMMR. @where-is-paul did you refer to these changes?

EbTech commented 2 years ago

@stephankokkas conservative display ratings are computed by https://github.com/EbTech/Elo-MMR/blob/b513efe/multi-skill/src/systems/common/player.rs#L15, and appear in the all_players.csv file that's produced if you follow the README instructions. Keeping in mind the caveats I mentioned above, if you still want to enable time-based decay, the relevant parameter is EloMMR::drift_per_sec.

@Wulfheart Unless I'm forgetting something, the old APIs should still work. A bit of backward-compatibility was lost when minor changes were made to the file format. We did add a new way to run the rating system using config files (briefly mentioned in README), but that's not quite ready for public primetime yet. We'll document it better when we intend for more people to use it.

Wulfheart commented 2 years ago

@EbTech just to be sure: The drift per second is only applied, when the player participates in another game, isn't it? So the easiest solution would be to add a user facing decay and let the library itself handle the calculation without this decay?