LearnersGuild / learning-os-software

Learning OS software best practices, global requirements, and architecture.
0 stars 3 forks source link

Players have an ELO-based ranking #164

Closed shereefb closed 8 years ago

shereefb commented 8 years ago

Context

To solve for this, the amount of XP a player gains on a project should be a function of their rank compared to the rank of the players they play with, in addition to their relative contribution and hours.

To do this, we need a separate stat that embodies rating, that's separate from XP.

Having a separate stat that represents rating allows us to:

  1. Form teams based on people's skill level (rating) not based on how much they've played the game.
  2. Better calculate XP distribution and relative contribution by using players' rating to adjust expected relative contribution.
  3. Weigh feedback more heavily for higher rated players (e.g. project completion and quality)
  4. Introduce players of different experience levels to the game at any time and have the team formation algorithm adjust for them quicker.

Proposed Solution

The Elo Rating System is widely used (chess, fifa, online gaming...etc.) as a system for rating two-team competitive games.

Every project retrospective can be seen us multiple two-player 'contests' with each player competing to contribute the most per hour spent on project.

For example, assume a project where the following players contribute the following:

player hours relative contribution
shereef 20 20%
tanner 40 40%
jeffrey 40 40%

Each player's relative contribution per hour is calculated by dividing their rc by hours:

player hours rc rc/h
shereef 20 20% 1
tanner 40 40% 1
jeffrey 40 40% 1

The retrospective is now broken down into three two-player contests:

contest rc/h : rc/h winner
shereef v.s. tanner 1:1 tie
shereef v.s. jeffrey 1:1 tie
jeffrey v.s. tanner 1:1 tie

As a result of this example, none of the player ratings would change, but all of their XP would increase, with Jeffrey and Tanner's XP increasing by double what Shereef's XP increased by.

Taking another example, where the players have different skill levels and different contributions:

player hours rc rc/h
shereef 20 40% 2
tanner 40 40% 1
jeffrey 40 20% 0.5

Breaking down the retrospective to three contests:

contest rc/h : rc/h winner
shereef v.s. tanner 2:1 shereef
shereef v.s. jeffrey 2:0.5 shereef
jeffrey v.s. tanner 0.5:1 tanner

Shereef wins two contests, and tanner wins one.

Considerations

  1. There should be a range within which two players rc/h is considered a tie. For example, if Tanner and Jeffrey have rc/h of 0.9 and 0.92 respectively, it doesn't make sense to consider this a "win" for Tanner.
  2. Related to 1, if someone's rc/h is thrice another players', they should get more of a rating bump than if it was just 1.5 times another players'. In other words, perhaps we should adjust the ELO ranking to consider 'degrees' of winning and losing. The way Elo rating is used in GO could offer a good starting point for tackling this. Also see margin of victory adjustments
  3. K factor needs to be adjusted intelligently. It needs to start off large and decrease over time (total hours played?, total retros?, xp?), allowing player rankings to swing more wildly earlier and settle later in the game. K is varied depending on the rating of the players, because of the low confidence in (lower) ratings (high fluctuation in the outcome) but high confidence in pro ratings (stable, consistent play). In GO K is 116 at rating 100 and 10 at rating 2700
  4. In Chess (or other ranked games) the games are played sequentially, and each ranking is adjusted before the next contest starts. In our case, multiple games are played in parallel. Here we have a tough choice, either:

a. Order games sequentially, and adjust player rankings after each contest. b. Adjust player rankings "in parallel"

Choosing a. gives us a more accurate distribution of rank, but disadvantages players based on how the games were ordered sequentially. Choosing b. is more fair, but does not give us as accurate a distribution.

For example, in the second example above, if Jeffrey loses to Tanner after Tanner loses to Shereef, then Jeffrey's final rating will be lower than if he loses to Tanner before Tanner loses to Shereef.

If we end up using a margin-based ELO (similar to GO) it might make more sense to run the games "in parallel"

shereefb commented 8 years ago

@prattsj , @jeffreywescott , @bundacia heads up.

One of my main take aways from introducing stats to learners, is how "God stats" like XP that attempt to roll up many sub-stats maybe less useful than I previously thought they would be. XP is trying to do too much, and we're using it for at least two different purposes.

I've been thinking since Friday about using a rating/ranking algorithm along with XP, and wanted to capture my thoughts in a game design issue.

Wanted to give you guys an early heads up about this so that you can weigh in early and frequently as this comes down the pipelines.

Next step for me is to discuss with @tannerwelsh and if he's up for it, start to run some ELO ranking simulations with the current data set we have and see wether or not it gives us a more accurate/higher resolution picture of our learners as they stand (by comparing to XP, and to Jarred's, Jrobs, and Mihai's rankings)

There's a lot of tweaks to how ELO can be used (K-factor, parallel v.s. sequential, discrete v.s. range...etc.) so I suspect it will take a bit of playing around with the data before we have a sense of wether or not this is a useful stat.

I can't imagine this hitting engineering backlog in the next two or three weeks but I wanted to get in the habit of including you folks as early as possible in potential paradigm-changing issues. If you have the bandwidth would love your feedback/thoughts/ideas...etc.

heyheyjp commented 8 years ago

Thanks, @shereefb. Feels really good to get a heads up on and have access to the convo about something this significant and complex so early. Will probably put off my own personal dive into the details here until after this week given our tight timeline, but I'm looking forward to coming up to speed. Super interesting stuff.

shereefb commented 8 years ago

Running a quick and dirty script to calculate ELO ratings. Ordered games by cycle, and ran sequentially, initial K-factor of 80 for first 10 games, and dropping to 16 afterwards:

Name Rating XP Elo Pod Rank XP Pod Rank
Jared Grippe 1271 217 1 2
Mihai Banulescu 1232 154 2 3
John Roberts 1227 227 3 1
Nico 1040 57 4 12
Rachel 1034 71 5 8
Devon Wesley 1022 80 6 4
Phillip Lorenzo 1015 54 7 13
EthanJStark 981 76 8 6
Ej 974 23 9 20
Aileen Santos 971 73 10 7
Majid Rahimi 951 78 11 5
James D Stewart 946 62 12 10
Shaka Lee 928 61 13 11
John Hopkins 921 66 14 9
anasauce 902 30 15 18
Yaseen Hussain 890 49 16 15
Harman Singh 874 52 17 14
Syd Rothman 855 37 18 17
Moniarchy 828 40 19 16
Thomas W. Smith 825 29 20 19

https://gist.github.com/shereefb/5b7e707b439b66aa5079a8326fc1052b

On face value this is a better ranking than XP based on these initial observations:

  1. EJ is no longer last. She only contributed to 1 project in the first cycle, and contributed as much as James did. If she was sick (instead of dropped out) for the next two weeks she would be ranked bottom based on XP which doesn't make sense.
  2. Anasauce moved up three form third to last. She missed the first 2.5 days of LG so didn't gain us much XP, but her ranking shouldn't suffer from that. She held her own with Nico during on instinctive-nyala and contributed 1.5 times Thomas on unusual-woodpecker.
  3. Yaseen and Harman move down the ranking. They are high on XP but they were with each other on 2 of 3 cycles.
  4. Jared, Mihai and Jrob rose to the top without considering any "hacked" XP value from the past.

Other observations

shereefb commented 8 years ago

Running a margin-based ELO shows VERY different results

Name Rating
Jared Grippe 1156
John Roberts 1128
Mihai Banulescu 1098
Devon Wesley 997
Rachel 990
John Hopkins 987
Nico 986
Ej 985
EthanJStark 985
James D Stewart 982
Phillip Lorenzo 980
Majid Rahimi 979
Aileen Santos 975
Shaka Lee 965
anasauce 956
Harman Singh 951
Yaseen Hussain 932
Syd Rothman 931
Thomas W. Smith 913
Moniarchy 913

game history

Player 1 Player 2 Result P1 New Rating P2 New Rating
Devon Wesley(1000) Jared Grippe(1000) 0.19 975 1024
Devon Wesley(975) Shaka Lee(1000) 0.53 979 995
Jared Grippe(1024) Shaka Lee(995) 0.82 1046 972
Jared Grippe(1046) Phillip Lorenzo(1000) 0.8 1064 981
Jared Grippe(1064) Thomas W. Smith(1000) 0.83 1083 980
Phillip Lorenzo(981) Thomas W. Smith(980) 0.55 985 975
Ej(1000) James D Stewart(1000) 0.5 1000 1000
Ej(1000) Jared Grippe(1083) 0.2 985 1097
James D Stewart(1000) Jared Grippe(1097) 0.2 986 1110
Aileen Santos(1000) John Roberts(1000) 0.25 979 1020
Aileen Santos(979) Majid Rahimi(1000) 0.5 981 997
John Roberts(1020) Majid Rahimi(997) 0.75 1037 979
Harman Singh(1000) John Roberts(1037) 0.18 978 1058
Harman Singh(978) Yaseen Hussain(1000) 0.47 978 999
John Roberts(1058) Yaseen Hussain(999) 0.81 1075 981
Jared Grippe(1110) Moniarchy(1000) 0.81 1122 987
Jared Grippe(1122) Nico(1000) 0.81 1133 988
Moniarchy(987) Nico(988) 0.51 988 986
anasauce(1000) John Hopkins(1000) 0.34 986 1013
anasauce(986) Mihai Banulescu(1000) 0.25 967 1018
anasauce(967) Rachel(1000) 0.31 955 1011
John Hopkins(1013) Mihai Banulescu(1018) 0.39 1004 1026
John Hopkins(1004) Rachel(1011) 0.46 1001 1013
Mihai Banulescu(1026) Rachel(1013) 0.57 1030 1008
EthanJStark(1000) John Roberts(1075) 0.25 988 1086
EthanJStark(988) Syd Rothman(1000) 0.61 998 989
John Roberts(1086) Syd Rothman(989) 0.83 1101 973
Jared Grippe(1133) Nico(986) 0.84 1144 974
Jared Grippe(1144) Phillip Lorenzo(985) 0.85 1146 974
Jared Grippe(1146) Syd Rothman(973) 0.9 1148 959
Jared Grippe(1148) Yaseen Hussain(981) 0.91 1150 966
Nico(974) Phillip Lorenzo(974) 0.51 974 973
Nico(974) Syd Rothman(959) 0.63 982 950
Nico(982) Yaseen Hussain(966) 0.65 991 956
Phillip Lorenzo(973) Syd Rothman(950) 0.62 979 943
Phillip Lorenzo(979) Yaseen Hussain(956) 0.64 987 947
Syd Rothman(943) Yaseen Hussain(947) 0.52 944 945
Devon Wesley(979) James D Stewart(986) 0.65 991 973
Devon Wesley(991) Jared Grippe(1150) 0.23 986 1150
Devon Wesley(986) Moniarchy(988) 0.78 1008 965
Devon Wesley(1008) Rachel(1008) 0.56 1013 1002
James D Stewart(973) Jared Grippe(1150) 0.14 962 1152
James D Stewart(962) Moniarchy(965) 0.65 974 952
James D Stewart(974) Rachel(1002) 0.41 970 1005
Jared Grippe(1152) Moniarchy(952) 0.92 1154 939
Jared Grippe(1154) Rachel(1005) 0.81 1155 996
Moniarchy(939) Rachel(996) 0.27 927 1007
EthanJStark(998) Jared Grippe(1155) 0.24 994 1155
EthanJStark(994) Majid Rahimi(979) 0.52 994 978
Jared Grippe(1155) Majid Rahimi(978) 0.77 1155 975
Aileen Santos(981) Mihai Banulescu(1030) 0.31 971 1039
Aileen Santos(971) Shaka Lee(972) 0.51 971 971
Mihai Banulescu(1039) Shaka Lee(971) 0.7 1047 962
Harman Singh(978) John Hopkins(1001) 0.37 970 1008
Harman Singh(970) John Roberts(1101) 0.24 963 1107
John Hopkins(1008) John Roberts(1107) 0.35 1007 1107
anasauce(955) Mihai Banulescu(1047) 0.23 943 1058
anasauce(943) Thomas W. Smith(975) 0.63 956 961
Mihai Banulescu(1058) Thomas W. Smith(961) 0.85 1075 943
Harman Singh(963) John Roberts(1107) 0.15 950 1119
Harman Singh(950) Yaseen Hussain(945) 0.52 951 943
John Roberts(1119) Yaseen Hussain(943) 0.86 1121 932
John Roberts(1121) Syd Rothman(944) 0.89 1123 931
anasauce(956) Jared Grippe(1155) 0.22 954 1155
anasauce(954) Nico(991) 0.47 955 989
anasauce(955) Phillip Lorenzo(987) 0.48 956 985
Jared Grippe(1155) Nico(989) 0.76 1155 986
Jared Grippe(1155) Phillip Lorenzo(985) 0.77 1155 981
Nico(986) Phillip Lorenzo(981) 0.51 986 980
James D Stewart(970) Mihai Banulescu(1075) 0.31 966 1078
James D Stewart(966) Thomas W. Smith(943) 0.74 982 926
Mihai Banulescu(1078) Thomas W. Smith(926) 0.86 1090 913
Devon Wesley(1013) EthanJStark(994) 0.51 1011 995
Devon Wesley(1011) Majid Rahimi(975) 0.48 1005 980
Devon Wesley(1005) Mihai Banulescu(1090) 0.29 997 1097
EthanJStark(995) Majid Rahimi(980) 0.48 991 983
EthanJStark(991) Mihai Banulescu(1097) 0.28 985 1098
Majid Rahimi(983) Mihai Banulescu(1098) 0.3 979 1098
Aileen Santos(971) Jared Grippe(1155) 0.24 969 1155
Aileen Santos(969) John Hopkins(1007) 0.53 976 999
Aileen Santos(976) Shaka Lee(962) 0.51 975 962
Jared Grippe(1155) John Hopkins(999) 0.78 1156 993
Jared Grippe(1156) Shaka Lee(962) 0.77 1156 960
John Hopkins(993) Shaka Lee(960) 0.48 987 965
John Roberts(1123) Moniarchy(927) 0.91 1125 914
John Roberts(1125) Rachel(1007) 0.86 1128 990
Moniarchy(914) Rachel(990) 0.38 913 990
tannerwelsh commented 8 years ago

Really interesting stuff here @shereefb, thanks for putting it together!

Side-note: please don't say "relative contribution" when you really mean "contribution" :)

jeffreywescott commented 8 years ago

This seems far superior to how we've been using XP. OND, etc.

shereefb commented 8 years ago

@tannerwelsh what's the difference between relative contribution and contribution? I've been using them interchangeably. What am I missing?

shereefb commented 8 years ago

Guessing at Jared's, Jrobs, and Mihai's initial rating (setting them at 1500,1500,1400) gets us slightly better results. As players lose less rating points because we 'thought' really advanced players were level 1000 to start with.

K factor = 200 for first 20 games then moves to 16

Player Elo Rating
John Roberts 1319
Jared Grippe 1287
Mihai Banulescu 1222
Devon Wesley 1079
Majid Rahimi 1076
John Hopkins 1066
Aileen Santos 1066
EthanJStark 1065
Shaka Lee 1064
Nico 1061
James D Stewart 1061
Rachel 1056
Phillip Lorenzo 1052
anasauce 1026
Ej 1021
Harman Singh 1005
Yaseen Hussain 976
Syd Rothman 964
Moniarchy 946
Thomas W. Smith 928
jeffreywescott commented 8 years ago

Issue moved to LearnersGuild/game-prototype #9 via ZenHub