eddwebster / football_analytics

๐Ÿ“Šโšฝ A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.
https://www.eddwebster.com
1.9k stars 265 forks source link
analytics awesome awesome-list data-science expected-goals fbref fifa football football-analytics football-data futbol opta soccer soccer-analytics soccer-data sports-analytics sports-stats statsbomb transfermarkt xg

Edd Webster Football Analytics

Edd Webster Analytics

A space for football analytics projects by Edd Webster, including a curated list of publicly available resources published by the football analytics community

![Visitors](https://visitor-badge.glitch.me/badge?page_id=eddwebster.football_analytics) trackgit-views ![GitHub Stars](https://img.shields.io/github/stars/eddwebster/football_analytics?style=plastic) ![GitHub Last Commit](https://img.shields.io/github/last-commit/eddwebster/football_analytics?style=plastic) ![GitHub Commit Activity](https://img.shields.io/github/commit-activity/m/eddwebster/football_analytics.svg) ![GitHub Repository Size](https://img.shields.io/github/repo-size/eddwebster/football_analytics?style=plastic) [![Licence](https://img.shields.io/badge/license-MIT-brightgreen.svg)](https://raw.githubusercontent.com/eddwebster/football_analytics/master/LICENSE)

-----------------------------------------------------

๐Ÿ‘‹ About This Repository and Author

The README of this repository is a resources guide of learning materials, data sources, libraries, papers, blogs, , etc., created by all those that have made contributions to the open source football analytics community. This GitHub repository and resources list is always a work in progress, with new resources added semi-regularly. If you feel there's any resource(s) that I have missed, please feel free to create a pull request or send me a message on the links above and I'll get back to you as quick as I can!

If you like the repo, please feel free to give it a โญ (top right). Cheers!

For more information about this repository and the author, see the following:

CV Badge Personal Website Badge Email Badge LinkedIn Badge Twitter Badge Linktree Badge GitHub Badge Tableau Badge

-----------------------------------------------------

๐Ÿ“ Table of Contents

Table of Contents
  1. ๐Ÿ‘‹ About This Repository and Author
  2. ๐Ÿ“ Table of Contents
  3. ๐Ÿš€ Getting Started
  4. ๐ŸŒต Repository Structure
  5. ๐Ÿ“š Source Code and Notebooks
  6. ๐Ÿ“ˆ Data Visualisation and Tableau
  7. ๐Ÿ“‘ Resources
  8. ๐Ÿ—ฃ๏ธ Citations
  9. ๐Ÿค Contributing
  10. โญ Star Tracker
  11. ๐Ÿ‘ Acknowledgements

-----------------------------------------------------

๐Ÿš€ Getting Started

โœ… Dependencies

The code in this repository is written in a mix of both Python and R. Before you begin, ensure that you have the following prerequisites installed:

  1. Python (ideally 3.6.1+ installed)
  2. R (ideally 4.0.4+ installed)
  3. The following Python and R libraries...

๐Ÿ Python

General Python data science libraries: * [`NumPy`](https://numpy.org/doc/stable/contents.html) for multidimensional array computing; * [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation; * [`matplotlib`](https://matplotlib.org/contents.html?v=20200411155018) and [`Seaborn`](https://seaborn.pydata.org/) for data visualisation; and * [`scitkit-learn`](https://scikit-learn.org/stable/index.html) and [`SciPy`](https://www.scipy.org/) for Machine Learning. Football analytics Python libraries: * [`kloppy`](https://github.com/PySport/kloppy) - a package for standardising tracking and event data by [Koen Vossen](https://twitter.com/mr_le_fox) and [Jan Van Haaren](https://twitter.com/JanVanHaaren). See the YouTube tutorial [[link](https://www.youtube.com/watch?v=JQbxpzNvGO8)] * [`floodlight`](https://github.com/floodlight-sports/floodlight) by [floodlight-sports](https://github.com/floodlight-sports) - package for streamlined analysis of sports data. It is designed with a clear focus on scientific computing and built upon popular libraries such as numpy or pandas. See the following documentation [[link](https://floodlight.readthedocs.io/en/latest/index.html)] * [`matplotsoccer`](https://github.com/TomDecroos/matplotsoccer) - a Python library for visualising soccer event data by [Tom Decroos](https://twitter.com/TomDecroos) * [`mplsoccer`](https://github.com/andrewRowlinson/mplsoccer) - a Python library for plotting football pitches in matplotlib by [Andrew Rowlinson](https://twitter.com/numberstorm) * [`PySport`](https://opensource.pysport.org/) including [`PySport Soccer`](https://opensource.pysport.org/?sports=Soccer) - collection of open-source sport packages including many of those mentioned in this section, by [Koen Vossen](https://twitter.com/mr_le_fox) * [`ScraperFC`](https://github.com/oseymour/ScraperFC) by [Owen Seymour](https://twitter.com/owen_seymour) - a Python package to scrape data from FiveThirtyEight data, [FBref](https://fbref.com/en/), [Understat](https://understat.com/), [Club Elo](http://clubelo.com/), [Capology](https://www.capology.com/) and [TransferMarkt](https://www.transfermarkt.us/). Previously scraped [Opta](https://www.statsperform.com/opta/) event data through the [WhoScored?](https://www.whoscored.com/) match center (functionality now removed but see old versions and GitHub repos to find this code) * [`statsbombapi`](https://github.com/Torvaney/statsbombapi) - a Python API wrapper and dataclasses for [StatsBomb](https://statsbomb.com/) data * [`statsbombpy`](https://github.com/statsbomb/statsbombpy) - a Python library written by Francisco Goitia to access [StatsBomb](https://statsbomb.com/) data * [`socceraction`](https://github.com/ML-KULeuven/socceraction) - a Python library for valuing the individual actions performed by soccer players. Includes an Expected Threat (xT) implementation by [Tom Decroos](https://twitter.com/TomDecroos) et. al. * [`soccer_xg`](https://github.com/ML-KULeuven/soccer_xg) by [ML KU Leuven](https://github.com/ML-KULeuven)- a Python package for training and analyzing expected goals (xG) models in football * [`soccerdata`](https://github.com/probberechts/soccerdata) - scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, SoFIFA and WhoScored by [Pieter Robberechts](https://twitter.com/p_robberechts) * [`tyrone_mings`](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://twitter.com/FC_rstats) - a Python [TransferMarkt](https://www.transfermarkt.co.uk/) webscraper

ยฎ๏ธ R

General R data science libraries: * tidyverse Football analytics R libraries: * [`ggsoccer`](https://github.com/Torvaney/ggsoccer) by [Ben Torvaney](https://twitter.com/Torvaney) - a soccer visualisation library in R * [`ggshakeR`](https://github.com/abhiamishra/ggshakeR) by [Abhishek Mishra](https://twitter.com/MishraAbhiA) - an analysis and visualisation R package that works with publicly available soccer data. See the following documentation [[link](https://ggshaker.github.io/)] * [`StatsBombR`](https://github.com/statsbomb/StatsBombR) - an R package to easily stream [StatsBomb](https://statsbomb.com/) data from the API using your log in credentials or from the Open Data GitHub repository cost free into R * [`soccermatics`](https://github.com/JoGall/soccermatics) by [Joe Gallagher](https://twitter.com/joedgallagher) - an R package for the visualisation and analysis of soccer tracking and event data * [`worldfootballR`](https://github.com/JaseZiv/worldfootballR) by [Jason Zivkovic](https://twitter.com/jaseziv) - a R package for extracting world football (soccer) data from [FBref](https://fbref.com/en/), [TransferMarkt](https://www.transfermarkt.com/), Understat and fotmob (see guide on how to use this package [[link](https://www.dontblamethedata.com/blog/extract-data-using-worldfootballr/)])

๐Ÿ” Return

![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)

๐ŸŒต Repository Structure

The contents of this GitHub repository is organised as follows: ๐Ÿ“‚ eddwebster/football_analytics/ โžก๏ธ central repository of code and analysis by Edd Webster ๐Ÿ“โšฝ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ dashboards/ โžก๏ธ store of Tableau dashboards used for analysis ๐Ÿ“Š๐Ÿ” โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ data/ โžก๏ธ a selection of raw and processed data extracts by various providers ๐Ÿ’พ๐Ÿ” โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ capology โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ davies โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ elo โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ fbref โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ fifa โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ guardian โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ metrica-sports โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ opta โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ reference โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ sb โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ shots โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ stats-perform โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ stratabet โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ tm โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ touchline-analytics โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ twenty-first-group โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ understat โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ wyscout โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ docs/ โžก๏ธ store of documentation for different vendors ๐Ÿ“„๐Ÿ“š โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ centre-circle โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ metrica-sports โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ opta โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ sb โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ shots โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ stratabet โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ wyscout โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ fonts/ โžก๏ธ store of custom and externally acquired fonts used for data visualisation โœ๏ธ๐Ÿ“„ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ .gitignore โžก๏ธ ignore unnecessary files for version control with Git ๐Ÿšซ๐Ÿ“ค โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ img/ โžก๏ธ store of images used for analysis including club badges, vendor logos and official media images ๐Ÿ“ท๐Ÿ’พ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ club_badges/ # badges for football clubs โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ edd_webster/ # images related to Edd Werbster โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ fig/ # generated figures derived from analysis and reports in this repository โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ gif/ # GIF images โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ memes/ # memes โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ pitches/ # images of football pitches and goals used mostly for Tableau visualisation โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ players/ # images of football players โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ vendors/ # logos for data vendors e.g. StatsBomb โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ vizpiration/ # high-quality visualisations and analysis from renowned members of the football analytics community โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ websites-blogs/ # logos for data analysis websites and blogs e.g. Club Elo โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ scripts/ โžก๏ธ store of libraries and Python and open source code ๐Ÿ“™๐Ÿ›  โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ notebooks/ โžก๏ธ Jupyter notebooks for exploration and visualisation โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ 1_data_scraping/ # notebooks with code to acquire data via webscraping โ”‚ โ”‚ โ”œโ”€โ”€ Capology Player Salary Web Scraping.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ FBref Player Stats Web Scraping.ipynb โ”‚ โ”‚ โ””โ”€โ”€ TransferMarkt Player Bio and Status Web Scraping.ipynb โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ 2_data_parsing/ # notebooks with code to acquire data via APIs โ”‚ โ”‚ โ”œโ”€โ”€ Elo Team Ratings Data Parsing.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ StatsBomb Data Parsing.ipynb โ”‚ โ”‚ โ””โ”€โ”€ Wyscout Data Parsing.ipynb โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ 3_data_engineering/ # notebooks with code to engineer raw, unprocessed data to processed data โ”‚ โ”‚ โ”œโ”€โ”€ Capology Player Salary Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ Centre Circle Opta CPL Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ FBref Player Stats Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ Opta #mcfcanalytics PL 2011-2012.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ StatsBomb Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ The Guardian Player Recorded Transfer Fees Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ TransferMarkt Historical Market Value Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ TransferMarkt Player Bio and Status Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ TransferMarkt Player Recorded Transfer Fees Data Engineering.ipynb โ”‚ โ”‚ โ”œโ”€โ”€ Understat Data Engineering.ipynb โ”‚ โ”‚ โ””โ”€โ”€ Wyscout Data Engineering.ipynb โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ 4_data_unification/ # notebooks with code to unify disperate datasets โ”‚ โ”‚ โ””โ”€โ”€ Unification of Aggregated Seasonal Football Datasets.ipynb โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ 5_data_analysis_and_projects # notebooks with code for example projects and analysis โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ player_similarity_and_clustering โ”‚ โ”‚ โ””โ”€โ”€ PCA and K-Means Clustering of 'Piquรฉ-like' Defenders.ipynb โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ tracking_data โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ metrica_sports โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ Metrica Tracking Data EDA.ipynb โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ signality โ”‚ โ”‚ โ”œโ”€โ”€ Signality Tracking Data Engineering.ipynb โ”‚ โ”‚ โ””โ”€โ”€ Signality Tracking Data EDA.ipynb โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ xg_modeling โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ shots_dataset โ”‚ โ”‚ โ”œโ”€โ”€ Logistic Regression Expected Goals Model.ipynb โ”‚ โ”‚ โ””โ”€โ”€ XGBoost Expected Goals Model.ipynb โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ opta_dataset โ”‚ โ””โ”€โ”€ raining of an Expected Goals Model Using Opta Event Data.ipynb โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ README.md โžก๏ธ project description and setup guide for better structure and collaboration ๐Ÿ“–๐Ÿค โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ research/ โžก๏ธ central repository of research and publicly available resources in football analytics ๐Ÿ“™โšฝ โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ documents/ # documents โ”‚ โ”œโ”€โ”€ ๐Ÿ“‚ papers/ # published academic papers and literature โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ slides/ # PowerPoint slides for published research โ”‚ โ””โ”€โ”€ ๐Ÿ“‚ video/ โžก๏ธ store of videos used or generated for analysis ๐ŸŽฅ๐Ÿ’พ

๐Ÿ” Return

![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)

๐Ÿ“š Source Code and Notebooks

The code in this repository is mostly written in Jupyter notebooks or Python scripts, organised in the following workflow: 1. [Webscraping](https://github.com/eddwebster/football_analytics/tree/master/notebooks/1_data_scraping) 2. [Data Parsing](https://github.com/eddwebster/football_analytics/tree/master/notebooks/2_data_parsing) 3. [Data Engineering](https://github.com/eddwebster/football_analytics/tree/master/notebooks/3_data_engineering) 4. [Data Unification](https://github.com/eddwebster/football_analytics/tree/master/notebooks/4_data_unification) 5. [Data Analysis](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_data_analysis_and_projects) - projects include working with [Tracking data](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_data_analysis_and_projects/tracking_data), constructing [VAEP models](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_data_analysis_and_projects/vaep) (as introduced by SciSports), building [xG models](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_data_analysis_and_projects/xg_modeling) using [Logistic Regression](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/5_data_analysis_and_projects/xg_modeling/shots_dataset/chance_quality_modelling/1%29%20Logistic%20Regression%20Expected%20Goals%20Model.ipynb), Random Forests and Gradient Booested Decision Tree algorithms such as [XGBoost](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/5_data_analysis_and_projects/xg_modeling/shots_dataset/chance_quality_modelling/2%29%20XGBoost%20Expected%20Goals%20Model.ipynb), and analysing [player similarity](https://github.com/eddwebster/football_analytics/tree/master/notebooks/5_data_analysis_and_projects/player_similarity_and_clustering) using PCA and K-Means clustering.

๐Ÿ” Return

![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)

๐Ÿ“Š Data Visualisation and Tableau Dashboards

For Tableau dashboards produced using the data engineered in the notebooks in this repository, please see my Tableau Public profile: [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster). Example Tableau dashboards: * [2018 FIFA Men's World Cup](https://public.tableau.com/app/profile/edd.webster/viz/EddWebster-WorldCup2018AnalysisandDashboard/WC2018PlayerDashboard) * [FA WSL](https://public.tableau.com/app/profile/edd.webster/viz/EddWebsterFAWSLAnalysisandDashboard/WSLxGAnalysisDashboard) * [โ€˜Big 5โ€™ European leagues](https://public.tableau.com/app/profile/edd.webster/viz/EddWebsterBig5EuropeanLeagueAnalysisandDashboards/Big5WaffleChart) * [EFL](https://public.tableau.com/app/profile/edd.webster/viz/EddWebsterEFLAnalysisandDashboards/EFLFullBackRadarDashboard) * [StrataBet Chance creation](https://public.tableau.com/app/profile/edd.webster/viz/EddWebsterStrataBetChanceAnalysisandDashboards/StrataBetChanceShotMapDashboard) * [Opta #mcfcanalytics](https://public.tableau.com/app/profile/edd.webster/viz/EddWebsterOptaMCFCAnalyticsPL1112AnalysisandDashboards/OptaPlayerDemographicsDashboard) (see [#mcfcanalytics](https://twitter.com/search?q=%23mcfcanalytics)).

๐Ÿ” Return

![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/rainbow.png)

:bookmark_tabs: Resources

:bookmark: Other Football Analytics Resources Guides

Credit to the following resources that were all used to plug gaps in this resources guide once it was published: * [`analytics-handbook`](https://github.com/devinpleuler/analytics-handbook) GitHub repo by [Devin Pleuler](https://twitter.com/devinpleuler)- a GitHub repo for getting started in soccer analytics * [`awesome-football`](https://github.com/openfootball/awesome-football) by [football.db](https://github.com/openfootball) ([Gerald Bauer](https://github.com/geraldb)) - a collection of awesome football datasets * [`awesome-football-analytics`](https://github.com/diegopastor/awesome-football-analytics) by [Diego Pastor](https://twitter.com/dxvgx) * [`awesome-soccer-analytics`](https://github.com/matiasmascioto/awesome-soccer-analytics) by [Matias Mascioto](https://twitter.com/matiasmascioto) * [`guideR`](https://docs.google.com/spreadsheets/d/16Xvhl7fCKEs1JTr-VXPZDmctO2gq4TcmuNmAhoHQQs0/edit#gid=627465558) by [Dom Samangy](https://twitter.com/dsamangy) - a Google spreadsheet with 200+ R resources, 100+ Python tutorials, 30+ packages, 25+ accounts to follow, 10 cheatsheets, and several free books & blogs. GitHub repo [[link](https://github.com/DomSamangy/Sports_Analytics_Guide)] * [Jan Van Haaren](https://twitter.com/janvanhaaren)'s Soccer Analytics Reviews: + [2020](https://janvanhaaren.be/posts/soccer-analytics-review-2020/) + [2021](https://janvanhaaren.be/posts/soccer-analytics-review-2021/) + [2022](https://janvanhaaren.be/posts/soccer-analytics-review-2022/) + [2023](https://janvanhaaren.be/posts/soccer-analytics-review-2023/) * [`soccer-analytics-resources`](https://github.com/JanVanHaaren/soccer-analytics-resources) Github repo by [Jan Van Haaren](https://twitter.com/janvanhaaren)

๐Ÿ” Return

![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/dark.png)

:runner: Getting Started with Football Analytics

Good resources for those new for the use of data in football: * Articles and blog posts: + [Getting into Sports Analytics](https://medium.com/@GregorydSam/getting-into-sports-analytics-ddf0e90c4cce) and [Getting into Sports Analytics 2.0](https://medium.com/@GregorydSam/getting-into-sports-analytics-2-0-129dfb87f5be) by [Sam Gregory](https://twitter.com/GregorydSam) + [What do you need to learn to work in football analytics?](https://barcainnovationhub.com/what-do-you-need-to-learn-to-work-in-football-analytics/) by [David Sumpter](https://twitter.com/Soccermatics) for [Barรงa Innovation Hub](https://barcainnovationhub.com/) + [Getting Into Scouting](https://griffinftbl.substack.com/p/getting-into-scouting) by [Luke Griffin](https://twitter.com/GriffinFtbl) + [You Want to be a Performance Analyst?](https://thevideoanalyst.com/want-performance-analyst/) by [Rob Carroll](https://twitter.com/thevideoanalyst) + [An Introduction to Soccer Analytics](https://spacespacespaceletter.com/an-introduction-to-soccer-analytics/) by [John Muller](https://twitter.com/johnspacemuller) + [Introduction to Analytics in...Soccer](http://sportsanalytics.sa.utoronto.ca/2015/02/20/introduction-to-analytics-in-soccer/) by [Valentin Stolbunov](https://twitter.com/vstolbunov) + [Sports Analytics Advice](https://linktr.ee/sportsanalyticsadvice) by [Jan Van Haaren](https://twitter.com/JanVanHaaren) + [Some of the useful resources in Football Analytics](https://footytistics.com/) + [Soccer Analytics 101](https://web.archive.org/web/20201101011408/https://www.mlssoccer.com/soccer-analytics-guide/2020/soccer-analytics-101) by [Kevin Minkus](https://twitter.com/kevinminkus) (using Web Archive) + A Career in Football Analytics blog posts by [Benoit Pimpaud](https://twitter.com/Ben8t). Check out his Substack newsletter [From An Engineer Sight](https://fromanengineersight.substack.com/). See also the accompanying Twitter thread by [Jan Van Haaren](https://twitter.com/JanVanHaaren) that discusses these posts [[link](https://twitter.com/JanVanHaaren/status/1511003282868781063)] - [Part 1 โ€” A Career in Football Analytics, The What](https://medium.pimpaudben.fr/part-1-a-career-in-football-analytics-the-what-91c888b3dcd2) - [Part 2 โ€” A Career in Football Analytics, The How](https://medium.pimpaudben.fr/part-2-a-career-in-football-analytics-the-how-ae8b5eca38ce) - [Part 3 โ€” A Career in Football Analytics, The Reality](https://medium.pimpaudben.fr/part-3-a-career-in-football-analytics-the-reality-ccd0812ef3bf) + [Football Reference 101 โ€” Finding your way through a gold mine](https://ninad06.medium.com/football-reference-101-finding-your-way-through-a-gold-mine-40fdb29b30a2) by [Ninad Barbadikar](https://twitter.com/NinadB_06) + [Mikhail Zhilkin: How to hire your first data scientist](https://trainingground.guru/articles/mikhail-zhilkin-how-to-hire-your-first-data-scientist) by [Training Ground Guru](https://trainingground.guru/) + [Gerard Moore on the "challenging but extremely rewarding" life" of a professional football analyst](http://www.twenty3.sport/gerard-moore-interview-football-analyst/) for [Twenty3](https://www.twenty3.sport) + [How to get started in data and the football industry](https://henshawanalysis.medium.com/how-to-get-started-in-data-and-the-football-industry-50d974e84bef) by [Liam Henshaw](https://twitter.com/HenshawAnalysis) + [How to get into football analysis](https://medium.com/@jkregista.6/how-to-get-into-football-analysis-cada6cf1ce76) by [La Notice](https://twitter.com/la_notice_) + [Getting Started with Football Analytics](https://oddalerts.com/insights/getting-started-football-analytics) by [OddAlerts](https://oddalerts.com/) + [Want to Learn Football Analytics?](https://medium.com/@irfanalghani11/want-to-learn-football-analytics-24cae325a30a) by [Irfan Alghani Khalid](https://www.linkedin.com/in/alghaniirfan/) + [How to get a job in Sports Analysis...](https://www.linkedin.com/pulse/how-get-job-sports-analysis-chris-gill/?trackingId=fSdnL9E2A9MbptVN6yZ4kg%3D%3D) by [Chris Gill](https://twitter.com/chrisgill_UK) + [7 Easy Steps to Get Started in Football Data & Analytics](https://jobsinfootball.com/blog/7-steps-to-get-started-in-football-data-analytics/) by [Jobs in Football](https://twitter.com/jobsinfootball) + [11 tips to get started in the Football industry](https://jobsinfootball.com/blog/11-tips-to-get-started-in-the-football-industry/) by [Jobs in Football](https://twitter.com/jobsinfootball) + [A Friendly Introduction to FPL Analytics](https://alpscode.com/blog/intro-to-fpl-analytics/) by [Sertalp B. ร‡ay](https://twitter.com/sertalpbilal) * GitHub repositories: + [`soccer-analytics-handbook`](https://github.com/devinpleuler/analytics-handbook) by [Devin Pleuler](https://twitter.com/devinpleuler) + [`awesome-football-analytics`](https://github.com/diegopastor/awesome-football-analytics) by [Diego Pastor](https://twitter.com/dxvgx) + [`awesome-soccer-analytics`](https://github.com/matiasmascioto/awesome-soccer-analytics) by [Matias Mascioto](https://twitter.com/matiasmascioto) + [`soccer-analytics-resources`](https://github.com/JanVanHaaren/soccer-analytics-resources) by [Jan Van Haaren](https://twitter.com/janvanhaaren) * Twitter threads: + [Measureables](https://twitter.com/MeasurablesPod) ([Brendan Kent](https://twitter.com/brendankent))'s Sports Analytics 101 unrolled Twitter thread [[link](https://threadreaderapp.com/thread/1407719595696398338.html)]: - [Sports Analytics 101](https://brendankent.com/sports-analytics-101/) - [Languages and Tools to Learn for Sports Analytics](https://brendankent.com/2020/12/16/languages-and-tools-to-learn-for-sports-analytics/) - [Coding for Sports Analytics: Resources to Get Started](https://brendankent.com/2020/09/15/coding-for-sports-analytics-resources-to-get-started/) - [Sports Analytics Reading List](https://brendankent.com/2021/06/15/sports-analytics-reading-list/) - [Free Sports Data Sources](https://brendankent.com/2021/03/09/free-sports-data-sources/) - [Where to Watch: Sports Analytics Conference Video Archives](https://brendankent.com/2020/09/17/where-to-watch-sports-analytics-conference-video-archives/) - [How to Start a Sports Analytics Club](https://brendankent.com/2020/09/28/how-to-start-a-sports-analytics-club/) + [Will Spearman's Twitter thread](https://twitter.com/the_spearman/status/1260713785138073604) + [Jan Van Haaren](https://twitter.com/JanVanHaaren)'s [Twitter thread](https://twitter.com/JanVanHaaren/status/1436336286223196201) for free, open-source software libraries for computing and visualising advanced soccer analytics metrics + [Measureables](https://twitter.com/MeasurablesPod) ([Brendan Kent](https://twitter.com/brendankent))'s Twitter thread for resources for learning to code in the context of sports analytics [[link](https://twitter.com/MeasurablesPod/status/1217499777245622278)] + [Sancho Quinn](https://twitter.com/SanchoQuinn)'s unrolled Twitter thread for learning more about video/performance analysis [[link](https://threadreaderapp.com/thread/1434543901067595784.html)] + [Ninad Barbadikar](https://twitter.com/NinadB_06)'s 'big football analytics' Twitter thread for getting started with football analytics [[link](https://twitter.com/NinadB_06/status/1409817891126452226)] + [McKay Johns](https://twitter.com/mckayjohns)'s Twitter threads for the best resources in football analytics [[link](https://twitter.com/mckayjohns/status/1369147457536335878)] and [[link](https://twitter.com/mckayjohns/status/1382405468585295873)] + [Joe Gallagher](https://twitter.com/joedgallagher)'s Twitter thread for the best resources to get started [[link](https://twitter.com/joedgallagher/status/1399461951386828805)] + [Sam Goldberg](https://twitter.com/SamGoldberg1882)'s Twitter thread for "lessons American Soccer Analysis wish we knew prior to working in sports analytics." [[link](https://twitter.com/SamGoldberg1882/status/1417111138865664003)] + [Floris Goes-Smit](https://twitter.com/MeasurablesPod)'s Tweet's: - [Becoming a Data Scientist in Football](https://threadreaderapp.com/thread/1508453394536620040.html) - [Floris' personal journey of becoming a Data Scientist in the football industry](https://threadreaderapp.com/thread/1508453394536620040.html) - [Preparing for a technical interview for a Data Science position](https://threadreaderapp.com/thread/1508453394536620040.html) + [Mathew Barlowe](https://twitter.com/matthew_barlowe)'s Twitter thread for "how to get into the sports analytics industry" [[link](https://twitter.com/matthew_barlowe/status/1420598697486913540)] + [Aaron Moniz](https://twitter.com/amonizfootball)'s Tweet and responses [[link](https://twitter.com/amonizfootball/status/1480244012770639875)] * LinkedIn Posts: + [WHERE TO LEARN FOOTBALL ANALYTICS?](https://www.linkedin.com/posts/alghaniirfan_footballanalytics-datascience-machinelearning-activity-6922000959384555521-gyqS) by [Irfan Alghani Khalid](https://www.linkedin.com/in/alghaniirfan/) + The following LinkedIn posts by [Hadi Sotudeh](https://twitter.com/sarehang): - [How to start in football analytics](https://www.linkedin.com/posts/hadisotudeh_football-github-datasets-activity-6960611122760531969-naeV/) - [โ€œSoccer Analyticsโ€ course summaries (2022)](https://www.linkedin.com/posts/hadisotudeh_eth-uefa-euro-activity-6940579849644212225-lTMf/) - [โ€œSoccer Analyticsโ€ course summaries (2023)](https://www.linkedin.com/posts/hadisotudeh_students-projects-matchanalysis-activity-7074738585316257793-I8wB/) - [How to get a #job in football analytics](https://www.linkedin.com/posts/hadisotudeh_football-job-linkedin-activity-6963152537088602113-qz0R/) - [Other questions about job opportunities](https://www.linkedin.com/posts/hadisotudeh_job-footballanalytics-workpermit-activity-6965676165449535488-u4O3/) * Videos: + [Friends of Tracking](https://www.youtube.com/channel/UCUBFJYcag8j2rm_9HkrrA7w) videos: - [How to become a football data scientist](https://www.youtube.com/watch?v=9J8CwOtjOiw) with Pascal Bauer, [Javier Fernรกndez](https://twitter.com/JaviOnData), [Sudarshan 'Suds Gopaladesikan](https://twitter.com/suds_g), [Fran Peralta](https://twitter.com/PeraltaFran23), and [David Sumpter](https://twitter.com/Soccermatics) - [Tools for getting started in football analytics.](https://www.youtube.com/watch?v=moFkcpsIKz4) talk for [Friends of Tracking](https://www.youtube.com/channel/UCUBFJYcag8j2rm_9HkrrA7w) with [David Sumpter](https://twitter.com/Soccermatics), [Laurie Shaw](https://twitter.com/EightyFivePoint), [Pascal Bauer](https://twitter.com/pascal_bauer), [Sudarshan 'Suds' Gopaladesikan](https://twitter.com/suds_g) and [Fran Peralta](https://twitter.com/PeraltaFran23) - [What do data analysts and data scientists do at a football club?](https://www.youtube.com/watch?v=GLcGf-8oqO4) talk for [Friends of Tracking](https://www.youtube.com/channel/UCUBFJYcag8j2rm_9HkrrA7w) with [David Sumpter](https://twitter.com/Soccermatics), [Ashwin Raman](https://twitter.com/AshwinRaman_), [Hannah Roberts](https://twitter.com/riptideltd), [Sam Gregory](https://twitter.com/GregorydSam), and [Rob Suddaby](https://twitter.com/robsuddaby) + [HANIC Panel "How to get into Sports Analytics & Media + Analytics"](https://www.youtube.com/watch?v=oUVISEJaEMM) with Alison Lukan, Sarah Bailey, Harman Dayal, [Asmae Toumi](https://twitter.com/asmae_toumi), and Mike Johnson + [Careers in Sports Analytics](https://www.youtube.com/watch?v=0Y46KjeVsD0) + [Chris Gill](https://twitter.com/chrisgill_UK)'s [Sports Analysis YouTube Channel](https://www.youtube.com/channel/UCb60z8UyQJOFnXcDLmME-0A/videos), including videos for [Writing the perfect CV](https://www.youtube.com/watch?v=_UrZUhwkAfg), [How to get a job in sports analysis](https://www.youtube.com/watch?v=3GohwDmM0aY), [LinkedIn tips](https://www.youtube.com/watch?v=TeLpkPG7Oxo), amoungst other videos added regularly * Glossaries: + [The Athleticโ€™s football analytics glossary: explaining xG, PPDA, field tilt and how to use them](https://theathletic.com/2730755/2021/07/28/the-athletics-football-analytics-glossary-explaining-xg-ppda-field-tilt-and-how-to-use-them/) by [Mark Carey](https://twitter.com/MarkCarey93) and [Tom Worville](https://twitter.com/Worville) (requires subscription) + [Stat Glossary](https://thefutebolist.wordpress.com/stat-glossary/) by [Ashwin Raman](https://twitter.com/AshwinRaman_) + [Football Analytics Glossary](https://footballstatsglossary.home.blog/) by [Ashwin Raman](https://twitter.com/AshwinRaman_) and [Mark Thompson](https://twitter.com/EveryTeam_Mark) + [Expected goals, expected assists, pressures, carries, high turnovers and more | Advanced stats explained](https://www.skysports.com/football/news/11095/12829539/expected-goals-expected-assists-pressures-carries-high-turnovers-and-more-advanced-stats-explained) by [Sky Sports Football](https://www.skysports.com/football/) * Podcasts: + [Fanalytics](https://open.spotify.com/show/3G3LWoSWZdHW4Gg6igjIHU?si=9v83huJIR-GUxAnKRyXLRA) podcast with Mike Lewis - [Getting Your Foot in the Door](https://soundcloud.com/fanalytics/sports-analytics-getting-your-foot-in-the-door) with Sean Steffen + [What is sports analytics?](https://open.spotify.com/episode/3gIkGxJOmKkFRHoGAqRimB?si=6pPOVLfgTjuynfho6b4SPA&dl_branch=1) episode of the [Measureables](https://open.spotify.com/show/1B2KCrfMM6sDfNICsyVDlW?si=YAU9RS7sTGSyITF6OhgW9A&dl_branch=1) podcast by [Measureables](https://twitter.com/MeasurablesPod) ([Brendan Kent](https://twitter.com/brendankent))

๐Ÿ” Return

![-----------------------------------------------------](https://raw.githubusercontent.com/andreasbm/readme/master/assets/lines/dark.png)

๐Ÿ’พ Data

:information_source: Data Sources

Publicly available data sources and datasets relating to football, from Tracking data, Event data, aggregated player performance data, detailed match statistics, injury records and transfer values, and more. Data sources that have been used in the code and analysis in this repository can be found in the [`data`](https://github.com/eddwebster/football_analytics/tree/master/data) subfolder of this repository or in Google Drive (due to GitHub's 100mb file limit) [[link](https://drive.google.com/drive/folders/1r2Rf3CPsKnxyxtmDRIHQ2eoW5WwCzBa0?usp=sharing)]. All code however in this repository should enable you to scrape, parse, and engineer the datasets as per the output used for analysis and visualisations featured. To learn more about the different types of data available, such as Event and Tracking data, see the "Where can I get data?" section of [Devin Pleuler](https://twitter.com/devinpleuler)'s [`soccer_analytics_handbook`](https://github.com/devinpleuler/analytics-handbook) [[link](https://github.com/devinpleuler/analytics-handbook#where-can-i-get-data)]. For a quick primer of the free football data resources available, see the following Twitter thread by [James Nalton](https://twitter.com/JDNalton) [[link](https://twitter.com/JDNalton/status/1508011410747445250)].
Event data
Event Data is labelled data for each on-the-ball event that takes place during a game. The data is manually collected from television footage. To learn more about the data collection, see the following video [[link](https://www.youtube.com/watch?v=GyN-qpVfOWA&ab_channel=Numberphile)]. Each match of event data has around 2-3 thousand individual events (rows), depending on the provider. The main providers of this data are StatsBomb, Stats Perform (formally Opta), and Wyscout. | Name | Comments | Source / method(s) to get the data | | ----- | -------- | ----------------------------- | | [StatsBomb Open Data](https://statsbomb.com/what-we-do/hub/free-data/) |