6758-Project / hockey

0 stars 0 forks source link

Create Data Download script (Data Acquisition) #9

Closed JakeColor closed 2 years ago

JakeColor commented 2 years ago

Creates a simple data download script for NHL API data.

Ignored most of the "fancy" suggestions in the Milestone 1 instructions for now -- we can revisit later if required.

Testing

Running

With Python: python src/data/download_data.py

Full usage:

Download NHL API Data

optional arguments:
  -h, --help            show this help message and exit
  -d DATADIR [DATADIR ...], --datadir DATADIR [DATADIR ...]
                        Which directory to download games
  -s SEASONS [SEASONS ...], --seasons SEASONS [SEASONS ...]
                        Starting year of NHL seasons for which to download games
  -r, --regular-season-only
                        (boolean) if passed, download only regular season data
  -p, --postseason-only
                        (boolean) if passed, download only postseason data
JakeColor commented 2 years ago

Issue: allpostseason games return data, even if never played.

Example: https://statsapi.web.nhl.com/api/v1/game/2017030167/feed/live/

Probably fine for presence in raw data, will filter out at refinement/tidy data stage.

TimkLee commented 2 years ago

I like how the icon for the API is MLB rather than NHL. Some questions regarding the code and hockey.

Should we keep the input range within a certain period? 2005 appears to be the first year with data. https://statsapi.web.nhl.com/api/v1/game/2005010001/feed/live/ Otherwise it returns the following (404). image

For the project we only need data between 2016-17 and 2020-21. I am wondering if we need to allow the other users to access data outside this range. On a similar note, do we need to include preseason and all-star games?

I am having some trouble running the code, but that's most likely my problem. I looked through the code using Jupyter and believe everything is fine. Thanks!