alanrkessler / savantscraper

Using Python to add Baseball Savant data to a SQLite database
GNU General Public License v3.0
52 stars 16 forks source link

Baseball Savant Data Scraping

Alan Kessler

Background

Baseball Savant is a website that contains MLB Statcast data. This data includes data about each pitch such as speed, spin, and location. Most importantly, this source has information about the batted ball such as exit velocity and launch angle.

The website offers robust querying options. However, for predictive modeling/machine learning purposes, it is useful to have an observation for each pitch. The website currently does not offer this functionality. The goal of this project is to provide as complete a database of pitch events as possible. There are other projects that focus on specific queries or the functions can be easily modified to select different subsets of the data. This script allows for simple loading of a portable database or the creation of individual files that could be loaded to a storage object like S3 on AWS.


Resources


Details

This code pulls data from the Baseball Savant statcast search tool. It is the equivalent of downloading the detail-level delimited files from the website. The player of interest found in the "player_name" data element refers to the pitcher.

The script loops through teams, seasons, and home/road to ensure that the queries abide by the search performance limitations. The results can be saved to individual CSV files or loaded into an SQLite database file.

The functions require Python 3.6 or greater.


Changes for the 2019 Offseason


Disclaimer

Scraping this data is time consuming. Scraping data from the web may not be welcomed by its owner.