Closed jeffreywescott closed 8 years ago
First attempt at integration: https://github.com/LearnersGuild/game-prototype/pull/40
Not yet reflected in Playbook, so this is not ready for review
Moved to review for @shereefb and @LearnersGuild/los
(see stat description in https://github.com/LearnersGuild/playbook/pull/48)
@shereefb -- it seems #13 has a hard dependency on this, so this will need to be tagged as RFI
at the same time as #13, no?
@jeffreywescott yep. #13 definitely depends on this.
/cc @LearnersGuild/software this is RFI
We don't yet have the exact K-factor, but are tracking that issue here: #55
@shereefb awesome. With @jeffreywescott out I'm not sure exactly how we want to get these things onto the implementation board. My preference would be for game mechanics to create a new ticket there and include the specs in the description (likely cut & pasted from the game mechanics issue). The description of this issue contains a long comment thread, so it's not exactly clear what the end result is. WDYT?
I suck at vacation.
I am working a bit on Monday and could move an issue or two then.
@jeffreywescott we need to figure this out either way. Go back to vacation!
@bundacia would you be willing to fill the Prod. Dev. Flow role while Jeffrey's on vacay? I.e. you'd just do the moving of the cards from RFI?
I intend to do this work while on vacay, FWIW, but not every day, more like every week.
If things need a quicker turnaround, then it would be best for someone else to take on.
@shereefb commented on Mon Aug 01 2016
Context
To solve for this, the amount of XP a player gains on a project should be a function of their rank compared to the rank of the players they play with, in addition to their relative contribution and hours.
To do this, we need a separate stat that embodies rating, that's separate from XP.
Having a separate stat that represents rating allows us to:
Proposed Solution
The Elo Rating System is widely used (chess, fifa, online gaming...etc.) as a system for rating two-team competitive games.
Every project retrospective can be seen us multiple two-player 'contests' with each player competing to contribute the most per hour spent on project.
For example, assume a project where the following players contribute the following:
Each player's relative contribution per hour is calculated by dividing their rc by hours:
The retrospective is now broken down into three two-player contests:
As a result of this example, none of the player ratings would change, but all of their XP would increase, with Jeffrey and Tanner's XP increasing by double what Shereef's XP increased by.
Taking another example, where the players have different skill levels and different contributions:
Breaking down the retrospective to three contests:
Shereef wins two contests, and tanner wins one.
Considerations
a. Order games sequentially, and adjust player rankings after each contest. b. Adjust player rankings "in parallel"
Choosing a. gives us a more accurate distribution of rank, but disadvantages players based on how the games were ordered sequentially. Choosing b. is more fair, but does not give us as accurate a distribution.
For example, in the second example above, if Jeffrey loses to Tanner after Tanner loses to Shereef, then Jeffrey's final rating will be lower than if he loses to Tanner before Tanner loses to Shereef.
If we end up using a margin-based ELO (similar to GO) it might make more sense to run the games "in parallel"
@shereefb commented on Mon Aug 01 2016
@prattsj , @jeffreywescott , @bundacia heads up.
One of my main take aways from introducing stats to learners, is how "God stats" like XP that attempt to roll up many sub-stats maybe less useful than I previously thought they would be. XP is trying to do too much, and we're using it for at least two different purposes.
I've been thinking since Friday about using a rating/ranking algorithm along with XP, and wanted to capture my thoughts in a game design issue.
Wanted to give you guys an early heads up about this so that you can weigh in early and frequently as this comes down the pipelines.
Next step for me is to discuss with @tannerwelsh and if he's up for it, start to run some ELO ranking simulations with the current data set we have and see wether or not it gives us a more accurate/higher resolution picture of our learners as they stand (by comparing to XP, and to Jarred's, Jrobs, and Mihai's rankings)
There's a lot of tweaks to how ELO can be used (K-factor, parallel v.s. sequential, discrete v.s. range...etc.) so I suspect it will take a bit of playing around with the data before we have a sense of wether or not this is a useful stat.
I can't imagine this hitting engineering backlog in the next two or three weeks but I wanted to get in the habit of including you folks as early as possible in potential paradigm-changing issues. If you have the bandwidth would love your feedback/thoughts/ideas...etc.
@prattsj commented on Mon Aug 01 2016
Thanks, @shereefb. Feels really good to get a heads up on and have access to the convo about something this significant and complex so early. Will probably put off my own personal dive into the details here until after this week given our tight timeline, but I'm looking forward to coming up to speed. Super interesting stuff.
@shereefb commented on Tue Aug 02 2016
Running a quick and dirty script to calculate ELO ratings. Ordered games by cycle, and ran sequentially, initial K-factor of 80 for first 10 games, and dropping to 16 afterwards:
https://gist.github.com/shereefb/5b7e707b439b66aa5079a8326fc1052b
On face value this is a better ranking than XP based on these initial observations:
Other observations
@shereefb commented on Tue Aug 02 2016
Running a margin-based ELO shows VERY different results
game history
@tannerwelsh commented on Tue Aug 02 2016
Really interesting stuff here @shereefb, thanks for putting it together!
Side-note: please don't say "relative contribution" when you really mean "contribution" :)
@jeffreywescott commented on Tue Aug 02 2016
This seems far superior to how we've been using XP. OND, etc.
@shereefb commented on Tue Aug 02 2016
@tannerwelsh what's the difference between relative contribution and contribution? I've been using them interchangeably. What am I missing?
@shereefb commented on Tue Aug 02 2016
Guessing at Jared's, Jrobs, and Mihai's initial rating (setting them at 1500,1500,1400) gets us slightly better results. As players lose less rating points because we 'thought' really advanced players were level 1000 to start with.
K factor = 200 for first 20 games then moves to 16