alan-turing-institute / TuringDataStories

TuringDataStories: An open community creating “Data Stories”: A mix of open data, code, narrative 💬, visuals 📊📈 and knowledge 🧠 to help understand the world around us.
Other
39 stars 12 forks source link

[Turing Data Story] Esports analysis: Overwatch #163

Open helendduncan opened 3 years ago

helendduncan commented 3 years ago

Story description

Please provide a high level description of the Turing Data Story This is an idea suggested by @teapowell at the septembRSE session. Google colab:

Overwatch is a 6v6 team game played across different maps with different objectives.

The idea is that we can first have look at the map data which contains stats about maps and team wins/losses and predict (based on historic data) the result of any match between two (perhaps mid-tier) teams. Potentially Bayesian analysis?

Which datasets will you be using in this Turing Data Story? Initially we thought there would be two sources of data one from a fan-based API and one from the Official overwatch page. The fan-based content seems to be abandoned. Blizzard do not have an overwatch API so @teapowell suggests a web scraper will be needed to pull the data. Getting the data: There is a stats page that contains downloadable csv files for some selected datasets between 2018 and 2021. We are using the map data first.

Teams also have an overall power ranking which is available on the main Overwatch league website. We will likely want to scrape this data.

Two years ago content moved from Twitch to YouTube - perhaps this changing platform explains the loss of stats

Additional context

PoC Work Packages

Brackets indicate stretch goal functionality

PoC Stretch Goals

Ethical guideline

Ideally a Turing Data Story has these properties and follows the 5 safes framework.

ChristinaLast commented 2 years ago

@triangle-man suggestion is https://en.wikipedia.org/wiki/TrueSkill (python implementation here) to rate multiple game players.

kevinxufs commented 2 years ago

A few other things to mention

In the future we might look at 'player level' analysis. Possibly relevant is Eric's story on baseball player replacement value: https://alan-turing-institute.github.io/TuringDataStories-fastpages/baseball/bayesian%20modeling/sql/2021/07/21/Baseball-Replacement-Level.html

We also discussed today who we might get as a reviewer. In general we want to have one 'data science specialist' (I believe @jack89roberts has volunteered!) and one 'subject matter specialist'. On the subject matter specialist, we've discussed going on twitter or reddit to find an overwatch analyst of some kind.

teapowell commented 2 years ago

TODO: Add me as contributer

kevinxufs commented 2 years ago

So currently we're experiencing some difficulty. We have (rolling) ELO ratings, which by themselves can use to get an accuracy rating of around 63%, and we also have rolling wins / losses over the season.

In theory combining ELO ratings and rolling wins / losses should lead to overall higher accuracy, but most of our models right now are peaking at 61%, which is actually lower than ELO ratings by themselves.

It's possible that we've made a technical error somewhere in the code, or that something else has gone wrong. In the case that we can't fix this, we are left with a bit of an unsatisfactory conclusion. There are three clear ways to progress from there:

Some technical suggestions from @mhauru

Things to try to improve results with ELO + map rates: Use resampling to check statistical robustness Try different models, pay attention to whether data should be normalised Try a different train/validation split. Maybe the end of the season is different from the bulk? Think about the parameters for the ELO stuff: What should k be? Are we sure that the starting value (1500) and the normalisation coefficient (400) are arbitrary and independent? Maybe try ELO ratings on a match-level, see how it differs.

kevinxufs commented 2 years ago

We went through the code and realised that our 'dumb predictor' is being assessed differently from the other predictors, essentially cheating in some way.

Currently the dumb predictor goes through all 800 matches (the entire dataset) and then for every triple (team a, team b, map) it computes the overall match up result. If for example team a has the better match up results on the given map, it then says something along the lines of 'for any arbitrary match between A and B on this map, we think A will win'.

The problem is that we then evaluate it by running this through all the data again. I.e. The dumb predictor iterates through all the matches trying to predict who will win, using historic knowledge of the overall match up results! No wonder it gets such a high accuracy level.

Instead, we need to come up with a different measure. The logical one is to do our training / validation split, i.e. all models should be provided information about 80% of the data and then should predict the remaining 20% data.

For the historical win loss, this means that the model gets information up until e.g. April on match up results, and needs to use this to predict games in e.g. June.

For the ELO calculation, this means that the model gets information up until e.g. April on ELO results and then uses this to predict games in e.g. June.

This gets a bit messy when we're training a linear function. The simple way is to train the weights of the linear function up until April, and then use April's data to predict June. However, what would also make sense (but follows a different metric) would be to train the function up until April, but be able to use June data. The intuition here is that we're creating a machine in April that will take in different inputs. We can't change the machine, but we're saying that on the day of the June game - we will run the machine on data up until June.

kevinxufs commented 2 years ago

looking at going for the final draft now. Here's the current story structure:

Storyline:

1 - Introduction: Overwatch, how seasons work, how games work. Set the problem statement here.

2 - Let's just explore the data to see what's going on. What are the relevant pieces of data (columns)

3 - Dumb predictors (optionals) a) The pure guessing strategy b) The win rate strategy

4) Elo model What is elo What is the success rate

(have we just lucked into this)

Resampling and variance (semi short)

varying the k in elo

5) We haven't used the fact that maps are different yet. Does map win rate make any difference? Do teams have any specialty

A useful predictor - but not very useful if we know elo already.

6) End, closing remarks - next steps