Closed shereefb closed 8 years ago
@prattsj , @jeffreywescott , @bundacia heads up.
One of my main take aways from introducing stats to learners, is how "God stats" like XP that attempt to roll up many sub-stats maybe less useful than I previously thought they would be. XP is trying to do too much, and we're using it for at least two different purposes.
I've been thinking since Friday about using a rating/ranking algorithm along with XP, and wanted to capture my thoughts in a game design issue.
Wanted to give you guys an early heads up about this so that you can weigh in early and frequently as this comes down the pipelines.
Next step for me is to discuss with @tannerwelsh and if he's up for it, start to run some ELO ranking simulations with the current data set we have and see wether or not it gives us a more accurate/higher resolution picture of our learners as they stand (by comparing to XP, and to Jarred's, Jrobs, and Mihai's rankings)
There's a lot of tweaks to how ELO can be used (K-factor, parallel v.s. sequential, discrete v.s. range...etc.) so I suspect it will take a bit of playing around with the data before we have a sense of wether or not this is a useful stat.
I can't imagine this hitting engineering backlog in the next two or three weeks but I wanted to get in the habit of including you folks as early as possible in potential paradigm-changing issues. If you have the bandwidth would love your feedback/thoughts/ideas...etc.
Thanks, @shereefb. Feels really good to get a heads up on and have access to the convo about something this significant and complex so early. Will probably put off my own personal dive into the details here until after this week given our tight timeline, but I'm looking forward to coming up to speed. Super interesting stuff.
Running a quick and dirty script to calculate ELO ratings. Ordered games by cycle, and ran sequentially, initial K-factor of 80 for first 10 games, and dropping to 16 afterwards:
Name | Rating | XP | Elo Pod Rank | XP Pod Rank |
---|---|---|---|---|
Jared Grippe | 1271 | 217 | 1 | 2 |
Mihai Banulescu | 1232 | 154 | 2 | 3 |
John Roberts | 1227 | 227 | 3 | 1 |
Nico | 1040 | 57 | 4 | 12 |
Rachel | 1034 | 71 | 5 | 8 |
Devon Wesley | 1022 | 80 | 6 | 4 |
Phillip Lorenzo | 1015 | 54 | 7 | 13 |
EthanJStark | 981 | 76 | 8 | 6 |
Ej | 974 | 23 | 9 | 20 |
Aileen Santos | 971 | 73 | 10 | 7 |
Majid Rahimi | 951 | 78 | 11 | 5 |
James D Stewart | 946 | 62 | 12 | 10 |
Shaka Lee | 928 | 61 | 13 | 11 |
John Hopkins | 921 | 66 | 14 | 9 |
anasauce | 902 | 30 | 15 | 18 |
Yaseen Hussain | 890 | 49 | 16 | 15 |
Harman Singh | 874 | 52 | 17 | 14 |
Syd Rothman | 855 | 37 | 18 | 17 |
Moniarchy | 828 | 40 | 19 | 16 |
Thomas W. Smith | 825 | 29 | 20 | 19 |
https://gist.github.com/shereefb/5b7e707b439b66aa5079a8326fc1052b
On face value this is a better ranking than XP based on these initial observations:
Other observations
Running a margin-based ELO shows VERY different results
Name | Rating |
---|---|
Jared Grippe | 1156 |
John Roberts | 1128 |
Mihai Banulescu | 1098 |
Devon Wesley | 997 |
Rachel | 990 |
John Hopkins | 987 |
Nico | 986 |
Ej | 985 |
EthanJStark | 985 |
James D Stewart | 982 |
Phillip Lorenzo | 980 |
Majid Rahimi | 979 |
Aileen Santos | 975 |
Shaka Lee | 965 |
anasauce | 956 |
Harman Singh | 951 |
Yaseen Hussain | 932 |
Syd Rothman | 931 |
Thomas W. Smith | 913 |
Moniarchy | 913 |
game history
Player 1 | Player 2 | Result | P1 New Rating | P2 New Rating |
---|---|---|---|---|
Devon Wesley(1000) | Jared Grippe(1000) | 0.19 | 975 | 1024 |
Devon Wesley(975) | Shaka Lee(1000) | 0.53 | 979 | 995 |
Jared Grippe(1024) | Shaka Lee(995) | 0.82 | 1046 | 972 |
Jared Grippe(1046) | Phillip Lorenzo(1000) | 0.8 | 1064 | 981 |
Jared Grippe(1064) | Thomas W. Smith(1000) | 0.83 | 1083 | 980 |
Phillip Lorenzo(981) | Thomas W. Smith(980) | 0.55 | 985 | 975 |
Ej(1000) | James D Stewart(1000) | 0.5 | 1000 | 1000 |
Ej(1000) | Jared Grippe(1083) | 0.2 | 985 | 1097 |
James D Stewart(1000) | Jared Grippe(1097) | 0.2 | 986 | 1110 |
Aileen Santos(1000) | John Roberts(1000) | 0.25 | 979 | 1020 |
Aileen Santos(979) | Majid Rahimi(1000) | 0.5 | 981 | 997 |
John Roberts(1020) | Majid Rahimi(997) | 0.75 | 1037 | 979 |
Harman Singh(1000) | John Roberts(1037) | 0.18 | 978 | 1058 |
Harman Singh(978) | Yaseen Hussain(1000) | 0.47 | 978 | 999 |
John Roberts(1058) | Yaseen Hussain(999) | 0.81 | 1075 | 981 |
Jared Grippe(1110) | Moniarchy(1000) | 0.81 | 1122 | 987 |
Jared Grippe(1122) | Nico(1000) | 0.81 | 1133 | 988 |
Moniarchy(987) | Nico(988) | 0.51 | 988 | 986 |
anasauce(1000) | John Hopkins(1000) | 0.34 | 986 | 1013 |
anasauce(986) | Mihai Banulescu(1000) | 0.25 | 967 | 1018 |
anasauce(967) | Rachel(1000) | 0.31 | 955 | 1011 |
John Hopkins(1013) | Mihai Banulescu(1018) | 0.39 | 1004 | 1026 |
John Hopkins(1004) | Rachel(1011) | 0.46 | 1001 | 1013 |
Mihai Banulescu(1026) | Rachel(1013) | 0.57 | 1030 | 1008 |
EthanJStark(1000) | John Roberts(1075) | 0.25 | 988 | 1086 |
EthanJStark(988) | Syd Rothman(1000) | 0.61 | 998 | 989 |
John Roberts(1086) | Syd Rothman(989) | 0.83 | 1101 | 973 |
Jared Grippe(1133) | Nico(986) | 0.84 | 1144 | 974 |
Jared Grippe(1144) | Phillip Lorenzo(985) | 0.85 | 1146 | 974 |
Jared Grippe(1146) | Syd Rothman(973) | 0.9 | 1148 | 959 |
Jared Grippe(1148) | Yaseen Hussain(981) | 0.91 | 1150 | 966 |
Nico(974) | Phillip Lorenzo(974) | 0.51 | 974 | 973 |
Nico(974) | Syd Rothman(959) | 0.63 | 982 | 950 |
Nico(982) | Yaseen Hussain(966) | 0.65 | 991 | 956 |
Phillip Lorenzo(973) | Syd Rothman(950) | 0.62 | 979 | 943 |
Phillip Lorenzo(979) | Yaseen Hussain(956) | 0.64 | 987 | 947 |
Syd Rothman(943) | Yaseen Hussain(947) | 0.52 | 944 | 945 |
Devon Wesley(979) | James D Stewart(986) | 0.65 | 991 | 973 |
Devon Wesley(991) | Jared Grippe(1150) | 0.23 | 986 | 1150 |
Devon Wesley(986) | Moniarchy(988) | 0.78 | 1008 | 965 |
Devon Wesley(1008) | Rachel(1008) | 0.56 | 1013 | 1002 |
James D Stewart(973) | Jared Grippe(1150) | 0.14 | 962 | 1152 |
James D Stewart(962) | Moniarchy(965) | 0.65 | 974 | 952 |
James D Stewart(974) | Rachel(1002) | 0.41 | 970 | 1005 |
Jared Grippe(1152) | Moniarchy(952) | 0.92 | 1154 | 939 |
Jared Grippe(1154) | Rachel(1005) | 0.81 | 1155 | 996 |
Moniarchy(939) | Rachel(996) | 0.27 | 927 | 1007 |
EthanJStark(998) | Jared Grippe(1155) | 0.24 | 994 | 1155 |
EthanJStark(994) | Majid Rahimi(979) | 0.52 | 994 | 978 |
Jared Grippe(1155) | Majid Rahimi(978) | 0.77 | 1155 | 975 |
Aileen Santos(981) | Mihai Banulescu(1030) | 0.31 | 971 | 1039 |
Aileen Santos(971) | Shaka Lee(972) | 0.51 | 971 | 971 |
Mihai Banulescu(1039) | Shaka Lee(971) | 0.7 | 1047 | 962 |
Harman Singh(978) | John Hopkins(1001) | 0.37 | 970 | 1008 |
Harman Singh(970) | John Roberts(1101) | 0.24 | 963 | 1107 |
John Hopkins(1008) | John Roberts(1107) | 0.35 | 1007 | 1107 |
anasauce(955) | Mihai Banulescu(1047) | 0.23 | 943 | 1058 |
anasauce(943) | Thomas W. Smith(975) | 0.63 | 956 | 961 |
Mihai Banulescu(1058) | Thomas W. Smith(961) | 0.85 | 1075 | 943 |
Harman Singh(963) | John Roberts(1107) | 0.15 | 950 | 1119 |
Harman Singh(950) | Yaseen Hussain(945) | 0.52 | 951 | 943 |
John Roberts(1119) | Yaseen Hussain(943) | 0.86 | 1121 | 932 |
John Roberts(1121) | Syd Rothman(944) | 0.89 | 1123 | 931 |
anasauce(956) | Jared Grippe(1155) | 0.22 | 954 | 1155 |
anasauce(954) | Nico(991) | 0.47 | 955 | 989 |
anasauce(955) | Phillip Lorenzo(987) | 0.48 | 956 | 985 |
Jared Grippe(1155) | Nico(989) | 0.76 | 1155 | 986 |
Jared Grippe(1155) | Phillip Lorenzo(985) | 0.77 | 1155 | 981 |
Nico(986) | Phillip Lorenzo(981) | 0.51 | 986 | 980 |
James D Stewart(970) | Mihai Banulescu(1075) | 0.31 | 966 | 1078 |
James D Stewart(966) | Thomas W. Smith(943) | 0.74 | 982 | 926 |
Mihai Banulescu(1078) | Thomas W. Smith(926) | 0.86 | 1090 | 913 |
Devon Wesley(1013) | EthanJStark(994) | 0.51 | 1011 | 995 |
Devon Wesley(1011) | Majid Rahimi(975) | 0.48 | 1005 | 980 |
Devon Wesley(1005) | Mihai Banulescu(1090) | 0.29 | 997 | 1097 |
EthanJStark(995) | Majid Rahimi(980) | 0.48 | 991 | 983 |
EthanJStark(991) | Mihai Banulescu(1097) | 0.28 | 985 | 1098 |
Majid Rahimi(983) | Mihai Banulescu(1098) | 0.3 | 979 | 1098 |
Aileen Santos(971) | Jared Grippe(1155) | 0.24 | 969 | 1155 |
Aileen Santos(969) | John Hopkins(1007) | 0.53 | 976 | 999 |
Aileen Santos(976) | Shaka Lee(962) | 0.51 | 975 | 962 |
Jared Grippe(1155) | John Hopkins(999) | 0.78 | 1156 | 993 |
Jared Grippe(1156) | Shaka Lee(962) | 0.77 | 1156 | 960 |
John Hopkins(993) | Shaka Lee(960) | 0.48 | 987 | 965 |
John Roberts(1123) | Moniarchy(927) | 0.91 | 1125 | 914 |
John Roberts(1125) | Rachel(1007) | 0.86 | 1128 | 990 |
Moniarchy(914) | Rachel(990) | 0.38 | 913 | 990 |
Really interesting stuff here @shereefb, thanks for putting it together!
Side-note: please don't say "relative contribution" when you really mean "contribution" :)
This seems far superior to how we've been using XP. OND, etc.
@tannerwelsh what's the difference between relative contribution and contribution? I've been using them interchangeably. What am I missing?
Guessing at Jared's, Jrobs, and Mihai's initial rating (setting them at 1500,1500,1400) gets us slightly better results. As players lose less rating points because we 'thought' really advanced players were level 1000 to start with.
K factor = 200 for first 20 games then moves to 16
Player | Elo Rating |
---|---|
John Roberts | 1319 |
Jared Grippe | 1287 |
Mihai Banulescu | 1222 |
Devon Wesley | 1079 |
Majid Rahimi | 1076 |
John Hopkins | 1066 |
Aileen Santos | 1066 |
EthanJStark | 1065 |
Shaka Lee | 1064 |
Nico | 1061 |
James D Stewart | 1061 |
Rachel | 1056 |
Phillip Lorenzo | 1052 |
anasauce | 1026 |
Ej | 1021 |
Harman Singh | 1005 |
Yaseen Hussain | 976 |
Syd Rothman | 964 |
Moniarchy | 946 |
Thomas W. Smith | 928 |
Issue moved to LearnersGuild/game-prototype #9 via ZenHub
Context
To solve for this, the amount of XP a player gains on a project should be a function of their rank compared to the rank of the players they play with, in addition to their relative contribution and hours.
To do this, we need a separate stat that embodies rating, that's separate from XP.
Having a separate stat that represents rating allows us to:
Proposed Solution
The Elo Rating System is widely used (chess, fifa, online gaming...etc.) as a system for rating two-team competitive games.
Every project retrospective can be seen us multiple two-player 'contests' with each player competing to contribute the most per hour spent on project.
For example, assume a project where the following players contribute the following:
Each player's relative contribution per hour is calculated by dividing their rc by hours:
The retrospective is now broken down into three two-player contests:
As a result of this example, none of the player ratings would change, but all of their XP would increase, with Jeffrey and Tanner's XP increasing by double what Shereef's XP increased by.
Taking another example, where the players have different skill levels and different contributions:
Breaking down the retrospective to three contests:
Shereef wins two contests, and tanner wins one.
Considerations
a. Order games sequentially, and adjust player rankings after each contest. b. Adjust player rankings "in parallel"
Choosing a. gives us a more accurate distribution of rank, but disadvantages players based on how the games were ordered sequentially. Choosing b. is more fair, but does not give us as accurate a distribution.
For example, in the second example above, if Jeffrey loses to Tanner after Tanner loses to Shereef, then Jeffrey's final rating will be lower than if he loses to Tanner before Tanner loses to Shereef.
If we end up using a margin-based ELO (similar to GO) it might make more sense to run the games "in parallel"