cpsievert / pitchRx

Tools for scraping MLB Gameday data and Visualizing PITCHf/x
http://cpsievert.github.io/pitchRx/
Other
124 stars 33 forks source link

Error in function (type, msg, asError = TRUE) : <url> malformed #58

Closed kevinantonevich closed 6 years ago

kevinantonevich commented 6 years ago

When trying to scrape new gamefiles I'm getting this error. I've restarted my session a few times and am running the latest version of R. What could be causing this? I had some problems with inconsistencies between my login keychain items in the past but I'm not sure if that could be it. Thanks!

jrbattles commented 6 years ago

Does your error occur only for 2017 regular season? If so, I think your issue is related to issue #57

I am given to understand that MLBAM modified their directory structure (trailing slash) when they moved to S3 but I think they only did this for 2017 regular season data. 2016 season and even 2017 pre-season data still works, right?

baseballbettor commented 6 years ago

Hi - I am not really an r programmer and don't know much about github however I had to get this fixed for my handicapping model. The problem did not seem to be with the url format change but with the miniscoreboard.xml; the scraper can no longer read it and so it cannot read new gids. The rest of the scraper works fine. I changed the updateGids function with one that reads gids from epg.xml instead and it is working for me:

My.updateGids <- function (last.date, end) { message("grabbing new game IDs") scoreboards <- paste0(makeUrls(start = last.date, end = end, gids = ""), "/epg.xml") obs <- XML2Obs(scoreboards) obs2 <- obs[grep("^epg//game$", names(obs))] print(obs2) gids <- collapseobs(obs2)[, "gameday"] paste0("gid", gids[!is.na(gids)]) print(paste0("gid_", gids[!is.na(gids)])) }

cpsievert commented 6 years ago

Thanks @baseballbettor! I'm now doing this as of c8db9d2