Open helendduncan opened 3 years ago
@triangle-man suggestion is https://en.wikipedia.org/wiki/TrueSkill (python implementation here) to rate multiple game players.
A few other things to mention
In the future we might look at 'player level' analysis. Possibly relevant is Eric's story on baseball player replacement value: https://alan-turing-institute.github.io/TuringDataStories-fastpages/baseball/bayesian%20modeling/sql/2021/07/21/Baseball-Replacement-Level.html
We also discussed today who we might get as a reviewer. In general we want to have one 'data science specialist' (I believe @jack89roberts has volunteered!) and one 'subject matter specialist'. On the subject matter specialist, we've discussed going on twitter or reddit to find an overwatch analyst of some kind.
TODO: Add me as contributer
So currently we're experiencing some difficulty. We have (rolling) ELO ratings, which by themselves can use to get an accuracy rating of around 63%, and we also have rolling wins / losses over the season.
In theory combining ELO ratings and rolling wins / losses should lead to overall higher accuracy, but most of our models right now are peaking at 61%, which is actually lower than ELO ratings by themselves.
It's possible that we've made a technical error somewhere in the code, or that something else has gone wrong. In the case that we can't fix this, we are left with a bit of an unsatisfactory conclusion. There are three clear ways to progress from there:
Some technical suggestions from @mhauru
Things to try to improve results with ELO + map rates: Use resampling to check statistical robustness Try different models, pay attention to whether data should be normalised Try a different train/validation split. Maybe the end of the season is different from the bulk? Think about the parameters for the ELO stuff: What should k be? Are we sure that the starting value (1500) and the normalisation coefficient (400) are arbitrary and independent? Maybe try ELO ratings on a match-level, see how it differs.
We went through the code and realised that our 'dumb predictor' is being assessed differently from the other predictors, essentially cheating in some way.
Currently the dumb predictor goes through all 800 matches (the entire dataset) and then for every triple (team a, team b, map) it computes the overall match up result. If for example team a has the better match up results on the given map, it then says something along the lines of 'for any arbitrary match between A and B on this map, we think A will win'.
The problem is that we then evaluate it by running this through all the data again. I.e. The dumb predictor iterates through all the matches trying to predict who will win, using historic knowledge of the overall match up results! No wonder it gets such a high accuracy level.
Instead, we need to come up with a different measure. The logical one is to do our training / validation split, i.e. all models should be provided information about 80% of the data and then should predict the remaining 20% data.
For the historical win loss, this means that the model gets information up until e.g. April on match up results, and needs to use this to predict games in e.g. June.
For the ELO calculation, this means that the model gets information up until e.g. April on ELO results and then uses this to predict games in e.g. June.
This gets a bit messy when we're training a linear function. The simple way is to train the weights of the linear function up until April, and then use April's data to predict June. However, what would also make sense (but follows a different metric) would be to train the function up until April, but be able to use June data. The intuition here is that we're creating a machine in April that will take in different inputs. We can't change the machine, but we're saying that on the day of the June game - we will run the machine on data up until June.
looking at going for the final draft now. Here's the current story structure:
Storyline:
1 - Introduction: Overwatch, how seasons work, how games work. Set the problem statement here.
2 - Let's just explore the data to see what's going on. What are the relevant pieces of data (columns)
3 - Dumb predictors (optionals) a) The pure guessing strategy b) The win rate strategy
4) Elo model What is elo What is the success rate
(have we just lucked into this)
Resampling and variance (semi short)
varying the k in elo
5) We haven't used the fact that maps are different yet. Does map win rate make any difference? Do teams have any specialty
A useful predictor - but not very useful if we know elo already.
6) End, closing remarks - next steps
Story description
Please provide a high level description of the Turing Data Story This is an idea suggested by @teapowell at the septembRSE session. Google colab:
Overwatch is a 6v6 team game played across different maps with different objectives.
The idea is that we can first have look at the map data which contains stats about maps and team wins/losses and predict (based on historic data) the result of any match between two (perhaps mid-tier) teams. Potentially Bayesian analysis?
Which datasets will you be using in this Turing Data Story? Initially we thought there would be two sources of data one from a fan-based API and one from the Official overwatch page. The fan-based content seems to be abandoned. Blizzard do not have an overwatch API so @teapowell suggests a web scraper will be needed to pull the data. Getting the data: There is a stats page that contains downloadable csv files for some selected datasets between 2018 and 2021. We are using the map data first.
[ ] Need to check that we can use this data: Their website states: "DOWNLOAD THE DATA Fans are welcome to play with the data. We are also providing a list of approved player/hero statistics aggregated at the map level, as well as basic match/map/score statistics across attack/defense splits, for the league to date. These files and dashboards will be updated regularly throughout the 2021 season."
[ ] Need to check openers and license of the IBM Watson
[ ] Want to pull in power rankings of the top 100. Specifically looking at the stats of their particular hero usage and the effect of a potential banning of that character.
Teams also have an overall power ranking which is available on the main Overwatch league website. We will likely want to scrape this data.
Two years ago content moved from Twitch to YouTube - perhaps this changing platform explains the loss of stats
Additional context
PoC Work Packages
Brackets indicate stretch goal functionality
PoC Stretch Goals
Ethical guideline
Ideally a Turing Data Story has these properties and follows the 5 safes framework.