cpsievert / pitchRx

Tools for scraping MLB Gameday data and Visualizing PITCHf/x
http://cpsievert.github.io/pitchRx/
Other
124 stars 33 forks source link

Fix 25: Scrape error with Minor League Data #49

Open keberwein opened 7 years ago

keberwein commented 7 years ago

This addresses issue 25, unable to scrape minor league gids.

I was able to fix this error by using tryCatch to ping the "...inning_all" urls to see if they exist. If not, download and parse the individual innings.

I added a nonMLB argument to the function arguments. The default is FALSE. Setting this to TRUE deploys the above-mentioned method.

We could do this same thing without the additional function argument, but I think adding that extra tryCatch in there for all gids might be overkill, and would affect performance with a large number of gids (like an entire season.)

The xml obs require a different parsing strategy for single innings. I split all the object parsing out into its own function called parseObs(). This function replaces lines 227-297 of scrape.R and places them at the end of the file.

I have also updated the roxygen lines, manual, namespace, etc...

Tests

devtools::install_github("keberwein/pitchRx", force=T)
library(pitchRx)

# Example from the documentation with run time.
start.time <- Sys.time()
data(nonMLBgids, package = "pitchRx")
aaa <- nonMLBgids[grepl("2014_06_02_[a-z]{3}aaa_[a-z]{3}aaa", nonMLBgids)]
dat <- scrape(game.ids = aaa)
end.time <- Sys.time()
end.time-start.time
# The first two gids have seven innings, the third has an inning_all.xml in the directory.

mixed_bag <- scrape(game.ids=c("gid_2010_06_01_lhvaaa_tolaaa_1", 
"gid_2010_06_02_albaaa_nasaaa_1", "gid_2014_06_02_srcaaa_freaaa_1"), nonMLB = T)
# Traditional scrape is unchanged and works the same as before.
start.time <- Sys.time()
scrape(start = "2016-07-23", "2016-07-24", connect=con)
end.time <- Sys.time()
end.time-start.time
keberwein commented 7 years ago

Looks like Travis found a documentation error. I can patch that up, just let me know what you think of the general method.