Implement TrueSkill for more than two teams

asyncth commented 3 months ago

Disclaimer: this is my attempt to blindly rewrite the trueskill Python package in Rust into this library, I barely understand what's going on there.

The idea here is to have a separate implementation for multi-team scenarios, but still have the 1v1 and 2 teams shortcuts that are currently implemented in master branch since they're likely to be faster. I, however, temporarily removed those shortcuts just so that I can run tests without modifying them. The multi-team implementation is supposed to be gated behind the disabled-by-default trueskill-full feature since this implementation introduces dependencies, while the 1v1 and 2 teams implementations are supposed to be available regardless of what features are enabled.

match_quality and expected_score functions are not yet implemented. Tests seem to be failing due to rating update functions returning results that are different from both values that are expected in the tests and from values returned by Python trueskill and TypeScript ts-trueskill packages, but only slightly (not sure what the issue is?).

Performance is likely horrible, not very happy about Rc<RefCell<Variable>> usage either, which probably could be avoided if I came up with some other way to implement these factor graphs.

In general this implementation feels rather inappropriate for this library right now as it is claimed that this crate is supposed to be lightweight and blazingly fast.

Closes #8.

asyncth commented 3 months ago

Also dependencies really could be removed too, I just didn't want to bother implementing matrix stuff.

atomflunder commented 3 months ago

Hi @asyncth ,

thank you very much for your contribution! I can't take a look right now but will do so later today.

As for the match_quality and Matrix stuff, I have already implemented those in a separate branch, only the Multi team function itself was missing. See PR #9 and Issue #8.

Maybe we could merge both of these branches? Anyways thank you very much again for your contribution and your time 😊

asyncth commented 3 months ago

Thanks, I would appreciate if you could give some kind of a guess on why rate functions seem to be around ±0.000001 off from the trueskill Python package, specifically I'm wondering if I did something wrong or if that's some kind of rounding issue. Pretty weird issue, considering that values returned from ts-trueskill match values returned from python trueskill package exactly, according to your calculator web app at least.

Maybe we could merge both of these branches?

Sure, probably should refactor that factor graph mess first though.

atomflunder commented 3 months ago

Thanks, I would appreciate if you could give some kind of a guess on why rate functions seem to be around ±0.000001 off from the trueskill Python package, specifically I'm wondering if I did something wrong or if that's some kind of rounding issue.

I do not have any clue at the moment, I read through your code and it looks (to me at least) to be the same as the original. I did have a kinda similar issue earlier in this crate's lifespan where the Glicko-2 calculations were slightly off and I put it down to being a rounding error - turns out I did in fact make a mistake and after fixing it the calculations were on point 😅.
That being said, I do think the difference is very small. I would not rule out a legit floating point rounding issue here.

Sure, probably should refactor that factor graph mess first though.

Sounds good. I do think it would be easier to integrate the multi team stuff into the other branch, rather than the other way around, but feel free to do whatever you like. I'm busy until the weekend, but after that I'll do my best to help.

In general this implementation feels rather inappropriate for this library right now as it is claimed that this crate is supposed to be lightweight and blazingly fast.

On that note, I have thought about the idea of this crate having feature flags for the specific rating algorithm to reduce bloat. Compilers will still just ignore the dead code so it's not a speed-up, but it will reduce compile times etc.
Stuff like:

[dependencies]
skillratings = {version = "0.26", features = ["serde", "trueskill", "glicko-2", "all"]}

But I don't know if its very user-friendly. Very open to suggestions here. I might open a separate issue for that discussion.

asyncth commented 3 months ago

Do you happen to understand factor graph message passing as a concept in general? I feel like it would be easier to refactor factor_graph if I did understand it, but I don't, so I'll take a look at the implementation in Skills repo tomorrow as well.

asyncth commented 3 months ago

Btw the reason I'm doing this now and not back when I did previous commits is that somehow it never came to my mind that I could just copy the other library without understanding anything, I don't know why didn't I think of that immediately... I wonder if we need to credit the trueskill package authors?

atomflunder commented 3 months ago

Do you happen to understand factor graph message passing as a concept in general?

Not really to be honest.

I wonder if we need to credit the trueskill package authors?

That's a good point, they have a weird license. We could add a credit for them in the doc comment for the trueskill module.

atomflunder commented 3 months ago

Hi again @asyncth ,

I have played around with the code some more and I found out why the calculations were off: The Normal Distribution Library differs slightly from the math implementations of the TrueSkill Python package, the Python package uses some shortcuts and approximations. After changing the math functions to the replicated ones (I have already implemented those earlier), the calculations line up (The rest diff I would say is a rounding error):

Screenshot from 2024-06-08 14-27-19

I have pushed these changes to the other branch, I hope that is okay 🙂
Need to clean up some stuff still and add tests. Also I think there will be merge conflicts due to the branch being pretty outdated.

asyncth commented 3 months ago

Any suggestions on how to get rid of Rc<RefCell<Variable>> usage? Unfortunately I still haven't taken a look at Skills repo like I said I would :(

atomflunder commented 3 months ago

Any suggestions on how to get rid of Rc<RefCell> usage?

Not at the moment no. I think the best course of action going forward is to merge the other branch into main, and then refactor later. We also probably should implement the partial play weighting, but that is not too difficult I think.

asyncth commented 3 months ago

Thank you for writing the doc comment and making a proper test, so what do you want me to do next?

atomflunder commented 3 months ago

Thank you for writing the doc comment and making a proper test, so what do you want me to do next?

Not sure 😅

I am kinda confused myself with the two branches right now, and the merge conflicts look like a headache to me. If you want to, you could copy the trueskill folder of the other branch into here, delete the dependencies, and then we just merge your branch and close the other.

Thoughts?

asyncth commented 3 months ago

Yeah, that might work, I'm not totally sure how to copy the commits and keep your authorship though

atomflunder commented 3 months ago

Since you did most of the work, I think it's only fair you get the credit

asyncth commented 3 months ago

Why though, I really appreciate you writing a doc comment, I hate doing it personally, especially since my English tends to be very inarticulate in non-casual styles

atomflunder commented 3 months ago

You could add the co-authored-by tag if you wanted 😊

asyncth commented 3 months ago

That Implement trueskill_multi_team commit got merged somewhat weirdly, it didn't have the MultiTeamOutcome import despite it being in mod.rs in your branch, not sure what the issue was, hopefully nothing else got affected by this

asyncth commented 3 months ago

@atomflunder What's the issue with the Coverage check? Does rustfmt need to be run?

atomflunder commented 3 months ago

That Implement trueskill_multi_team commit got merged somewhat weirdly, it didn't have the MultiTeamOutcome import despite it being in mod.rs in your branch, not sure what the issue was, hopefully nothing else got affected by this

I'm looking at the complete diff from this PR, and it looks fine to me. Also I am not sure what the Coverage Check is complaining about lol.

I will do some of the smaller, annoying stuff later, like adding tests for some of the edge cases, fixing some clippy lints, more docs etc. but all of that is not really critical, just as a heads up. :+1:
After that, I'll do a version bump and a new release.

If you want to tackle the performance improvements you talked about, feel free to open another PR in the future. In preparation for that, I will also probably add a benchmark, so we have a baseline to compare the performance to.

And I just wanna thank you again for the contribution, it's greatly appreciated. 🙏

atomflunder commented 3 months ago

The coverage check ran fine when I merged it, maybe it has something to do with the repository secret API key, I genuinely don't know.

atomflunder / skillratings

Implement TrueSkill for more than two teams #11