cpsievert / pitchRx

Tools for scraping MLB Gameday data and Visualizing PITCHf/x
http://cpsievert.github.io/pitchRx/
Other
124 stars 33 forks source link

gids object contains stale data - is there an update method? #23

Closed ppicalino closed 9 years ago

ppicalino commented 9 years ago

It appears that my gids object contains some stale game data, which is causing issues with "scrape".

For example, the 4/15/2014 game between Tampa Bay and Baltimore was postponed to a doubleheader on 6/27/2014. The gid of the original game remains in gids (which may be the expected behavior):

> gids[grepl("2014_04_15_tbamlb_[a-z]{6}",gids)] [1] "gid_2014_04_15_tbamlb_balmlb_1"

The makeup game on 6/27 is missing (only one of the two games in the doubleheader shows up in gids):

> gids[grepl("2014_06_27_tbamlb_[a-z]{6}",gids)] [1] "gid_2014_06_27_tbamlb_balmlb_1"

(see http://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_27/gid_2014_06_27_tbamlb_balmlb_2/ for the other game that should show up in this grep)

Is there a method for forcing an update of gids, pulled directly from the Gameday site?

cpsievert commented 9 years ago

You can always give scrape() gameday ids to scrape data from those games. So

dat <- scrape(game.ids = "gid_2014_06_27_tbamlb_balmlb_2")

would return data from http://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_27/gid_2014_06_27_tbamlb_balmlb_2/inning/inning_all.xml (since suffix "inning/inning_all.xml", by default)

If you have a database, scrape will actually figure out the extra files needed to properly update your tables:

library(dplyr)
db <- src_sqlite("pitchfx.sqlite3")
scrape(game.ids = "gid_2014_06_27_tbamlb_balmlb_2", connect = db$con)

Just so others are aware, other games from this date are missing as well. See here for more details.

Anyways, I'd consider this a bug, so I will add these games to data(gids) so that future versions of pitchRx won't have this problem.

ppicalino commented 9 years ago

Thanks for updating that. FYI I think there are also some games missing from data(gids) in 2008-11 but I haven't been able to diagnose it fully yet.

cpsievert commented 9 years ago

No problem, thanks for reporting. Don't hesitate to let me know about anything else (preferably as a GitHub issue)!

jrbattles commented 8 years ago

Is there a way to update data(gids) myself? for example. I see that the game ids are only updated through 2015.

cpsievert commented 8 years ago

I use this script to update data(gids) -- https://github.com/cpsievert/pitchRx/blob/master/inst/scripts/gids.R