helendduncan commented 3 years ago

Story description

Please provide a high level description of the Turing Data Story This is an idea suggested by @teapowell at the septembRSE session. Google colab:

Overwatch is a 6v6 team game played across different maps with different objectives.

The idea is that we can first have look at the map data which contains stats about maps and team wins/losses and predict (based on historic data) the result of any match between two (perhaps mid-tier) teams. Potentially Bayesian analysis?

Which datasets will you be using in this Turing Data Story? Initially we thought there would be two sources of data one from a fan-based API and one from the Official overwatch page. The fan-based content seems to be abandoned. Blizzard do not have an overwatch API so @teapowell suggests a web scraper will be needed to pull the data. Getting the data: There is a stats page that contains downloadable csv files for some selected datasets between 2018 and 2021. We are using the map data first.

[ ] Need to check that we can use this data: Their website states: "DOWNLOAD THE DATA Fans are welcome to play with the data. We are also providing a list of approved player/hero statistics aggregated at the map level, as well as basic match/map/score statistics across attack/defense splits, for the league to date. These files and dashboards will be updated regularly throughout the 2021 season."
[ ] Need to check openers and license of the IBM Watson
[ ] Want to pull in power rankings of the top 100. Specifically looking at the stats of their particular hero usage and the effect of a potential banning of that character.

Teams also have an overall power ranking which is available on the main Overwatch league website. We will likely want to scrape this data.

Two years ago content moved from Twitch to YouTube - perhaps this changing platform explains the loss of stats

Additional context

[ ] Blizzard do not have a great reputation (may need to consider this for ethics - N/B all of the issues that we are aware of seem to come from the company itself rather than anything overwatch specific)

PoC Work Packages

Brackets indicate stretch goal functionality

Data Ingestion
Data Clean
Refine to just 2021
Display teams map data
Teams win percentage per map
Teams map results against each other per map
Build a function with inputs of 2 teams, map, (and match map number) and outputs a win percentage (and expected score)

PoC Stretch Goals

Determine whether a team stomps or not
Map number in match analytics - e.g. if a team win first map do they maintain momentum

Ethical guideline

Ideally a Turing Data Story has these properties and follows the 5 safes framework.

[ ] The analysis you produce is openly available and reproducible.
[ ] Any data used are open and have an explicit licence, provenance and attribution.
[ ] Any data used are not personal data (i.e. the data is anonymous or anonymised).
[ ] Any linkage of datasets in your data story does not lead to an increased risk of the personal identification of individuals.
[ ] The Story must be truthful and clear about any limitations of analysis (and potential biases in data).
[ ] The Story will not lead to negative social outcomes, such as (but not limited to) increasing discrimination or injustice.

ChristinaLast commented 3 years ago

@triangle-man suggestion is https://en.wikipedia.org/wiki/TrueSkill (python implementation here) to rate multiple game players.

kevinxufs commented 2 years ago

A few other things to mention

In the future we might look at 'player level' analysis. Possibly relevant is Eric's story on baseball player replacement value: https://alan-turing-institute.github.io/TuringDataStories-fastpages/baseball/bayesian%20modeling/sql/2021/07/21/Baseball-Replacement-Level.html

We also discussed today who we might get as a reviewer. In general we want to have one 'data science specialist' (I believe @jack89roberts has volunteered!) and one 'subject matter specialist'. On the subject matter specialist, we've discussed going on twitter or reddit to find an overwatch analyst of some kind.

teapowell commented 2 years ago

TODO: Add me as contributer

kevinxufs commented 2 years ago

So currently we're experiencing some difficulty. We have (rolling) ELO ratings, which by themselves can use to get an accuracy rating of around 63%, and we also have rolling wins / losses over the season.

In theory combining ELO ratings and rolling wins / losses should lead to overall higher accuracy, but most of our models right now are peaking at 61%, which is actually lower than ELO ratings by themselves.

It's possible that we've made a technical error somewhere in the code, or that something else has gone wrong. In the case that we can't fix this, we are left with a bit of an unsatisfactory conclusion. There are three clear ways to progress from there:

Keep working on the ELO + wins/losses until we get a (significantly) higher accuracy rate.
Give up and be transparent in the story that sometimes things don't work out
Remove the wins/losses work and just end the story on ELO calculations.

Some technical suggestions from @mhauru

Things to try to improve results with ELO + map rates: Use resampling to check statistical robustness Try different models, pay attention to whether data should be normalised Try a different train/validation split. Maybe the end of the season is different from the bulk? Think about the parameters for the ELO stuff: What should k be? Are we sure that the starting value (1500) and the normalisation coefficient (400) are arbitrary and independent? Maybe try ELO ratings on a match-level, see how it differs.

kevinxufs commented 2 years ago

We went through the code and realised that our 'dumb predictor' is being assessed differently from the other predictors, essentially cheating in some way.

Currently the dumb predictor goes through all 800 matches (the entire dataset) and then for every triple (team a, team b, map) it computes the overall match up result. If for example team a has the better match up results on the given map, it then says something along the lines of 'for any arbitrary match between A and B on this map, we think A will win'.

The problem is that we then evaluate it by running this through all the data again. I.e. The dumb predictor iterates through all the matches trying to predict who will win, using historic knowledge of the overall match up results! No wonder it gets such a high accuracy level.

Instead, we need to come up with a different measure. The logical one is to do our training / validation split, i.e. all models should be provided information about 80% of the data and then should predict the remaining 20% data.

For the historical win loss, this means that the model gets information up until e.g. April on match up results, and needs to use this to predict games in e.g. June.

For the ELO calculation, this means that the model gets information up until e.g. April on ELO results and then uses this to predict games in e.g. June.

This gets a bit messy when we're training a linear function. The simple way is to train the weights of the linear function up until April, and then use April's data to predict June. However, what would also make sense (but follows a different metric) would be to train the function up until April, but be able to use June data. The intuition here is that we're creating a machine in April that will take in different inputs. We can't change the machine, but we're saying that on the day of the June game - we will run the machine on data up until June.

kevinxufs commented 2 years ago

looking at going for the final draft now. Here's the current story structure:

Storyline:

1 - Introduction: Overwatch, how seasons work, how games work. Set the problem statement here.

2 - Let's just explore the data to see what's going on. What are the relevant pieces of data (columns)

3 - Dumb predictors (optionals) a) The pure guessing strategy b) The win rate strategy

4) Elo model What is elo What is the success rate

(have we just lucked into this)

Resampling and variance (semi short)

varying the k in elo

5) We haven't used the fact that maps are different yet. Does map win rate make any difference? Do teams have any specialty

map type winrates
linear modelling
throw in some random model to show its possible, and that it is not useful.

A useful predictor - but not very useful if we know elo already.

6) End, closing remarks - next steps

ties?

alan-turing-institute / TuringDataStories

[Turing Data Story] Esports analysis: Overwatch #163

Story description

PoC Work Packages

Ethical guideline