cpsievert / pitchRx

Tools for scraping MLB Gameday data and Visualizing PITCHf/x
http://cpsievert.github.io/pitchRx/
Other
124 stars 33 forks source link

Scrape function error with Minor League Data #25

Open cidawkins opened 9 years ago

cidawkins commented 9 years ago

This has been driving me crazy for the last few days and I can't figure why it continues to fail. Every single time I try to run the scrape function it comes up with this error:

If file names don't print right away, please be patient. Error in function (type, msg, asError = TRUE) : Could not resolve host:

I tried following your post on nonMLBdata and only tweaked it for import into a MySQL database. I shut off my laptop and router firewall for a short period of time to test it and it still returned the same error.

this is my main initial code:

library(dplyr) library(RMySQL) library(pitchRx) drv = dbDriver("MySQL") con= dbConnect(drv, user="myusername", password = "mypassword", dbname= "nonmlb", host= "localhost") nonMLB08 <- nonMLBgids[grep("2008", nonMLBgids)] scrape(start = "2008-01-01", end = "2009-01-01", game.ids = nonMLB08, connect = con)

traceback() 8: fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition"))) 7: function (type, msg, asError = TRUE) { if (!is.character(type)) { i = match(type, CURLcodeValues) typeName = if (is.na(i)) character() else names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun = (if (asError) stop else warning) fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition"))) }(6L, "Could not resolve host: \016", TRUE) 6: .Call("R_curl_easy_perform", curl, .opts, isProtected, .encoding, PACKAGE = "RCurl") 5: curlPerform(curl = curl, .opts = opts, .encoding = .encoding) 4: getURL(urls, async = async) 3: urlsToDocs(urls, async = async, quiet = quiet) 2: XML2Obs(inning.filez, as.equiv = TRUE, url.map = FALSE, ...) 1: scrape(start = "2008-01-01", end = "2009-01-01", game.ids = nonMLB08, connect = con)

sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] pitchRx_1.6 ggplot2_1.0.0 dplyr_0.4.1 RMySQL_0.10.2 DBI_0.3.1

loaded via a namespace (and not attached): [1] assertthat_0.1 bitops_1.0-6 colorspace_1.2-6 digest_0.6.8 grid_3.1.2 gtable_0.1.2
[7] hexbin_1.27.0 lattice_0.20-30 magrittr_1.5 MASS_7.3-39 Matrix_1.1-5 mgcv_1.8-5
[13] munsell_0.4.2 nlme_3.1-120 parallel_3.1.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.5
[19] RCurl_1.95-4.5 reshape2_1.4.1 scales_0.2.4 stringr_0.6.2 tools_3.1.2 XML_3.98-1.1
[25] XML2R_0.0.6

I really need help with this.

cidawkins commented 9 years ago

Also sometimes it lists a single character after "Could not resolve host:"

I've had a "5", "F", and nothing off the top of my head.

cpsievert commented 9 years ago

Specifying a start and end date isn't necessary if you're using the game.ids argument. Try removing those.

cidawkins commented 9 years ago

Tried it and same error

cpsievert commented 9 years ago

Ah, I'm pretty sure this is happening because "inning/inning_all.xml" files don't exist for most (if not all) minor league games. The other file types should work though. For example,

x <- head(nonMLBgids)
files <- c("inning/inning_hit.xml", "miniscoreboard.xml", "players.xml")
dat <- scrape(game.ids = x, suffix = files)

I don't have time now, but hopefully in the next few months I'll make some modifications to grab "inning_[0-9].xml" files when "inning_all.xml" doesn't exist (for example)

cidawkins commented 9 years ago

I was wondering if you had a chance to work on this error

cpsievert commented 9 years ago

This likely won't get fixed (by me) anytime soon.