clausherther / nfl-dbt

This repo contains dbt models to transform NFL Play-by-Play (pbp) data sourced from https://github.com/nflverse/nflverse-data into analytical models
Apache License 2.0
20 stars 4 forks source link

NFL Play-by-play dbt models

This repo contains dbt models to transform NFL Play-by-Play (pbp) data sourced from https://github.com/nflverse/nflverse-data into analytical models.

Update Frequency

The nflverse-data repo is updated with some regularity, but since this is a voluntary and free resource, we can't rely on play data being updated weekly. So, this dataset and the analytical models are best used for teaching and model building purposes, and perhaps less so for weekly decision on sports bets etc.

Models

XA (Transformed Aggregates)

These models are aggregates of one or more of the models above:

Notes

Data Load

The repo assumes that the raw scraped data has been loaded to a BigQuery database, with one raw file corresponding to a single table in a database called raw.

The included Python script extract_load is intended to do the following:

The script uses the connection info defined in your local ~/.dbt/profiles.yml file and needs to be configured with the appropriate profile name and target to use:

E.g.:

dbt_profile_name = "nfl"
dbt_target_name = "bq"

The load portion currently only works for BigQuery, but could probably be extended to work with Snowflake and Redshift (:OOF:) as well.

Future Work

The following items would make great natural extensions and improvements to the repo: