Closed vegas31 closed 10 years ago
That is to be expected -- these values are missing (in the source files) unless runs are scored during the atbat.
I admit this is not the best data format. You probably want the running totals (without NAs).
Ahh, thanks for the clarification -- I was making a bad assumption about what those fields meant.
The commands you provide work for the most part -- looking at a subset of data (WAS/SDN), it appears it gets the home values correct (here, it's 0 for the entire game), but then it starts adding 1's after a point. I am looking to see if there's a particular reason why it changes over, but haven't found a trend yet.
Thanks again for your help!
Here is a method to convert home_team_runs/away_team_runs to the equivalent numeric representation.
library(pitchRx)
june8 <- scrape(start = "2014-06-08", end = "2014-06-08")
atbats <- june8$atbat
library(dplyr)
# make sure records are ordered by num (within game)
atbats <- split(atbats, atbats$gameday_link) %>%
lapply(., function(x) x[order(x$num), ]) %>%
rbind_all
# replace missing values with the next non-missing value
f <- function(runs) {
runs <- as.numeric(runs)
idx <- which(!is.na(runs))
rep(runs[idx], diff(c(0, idx)))
}
atbats$home_team_runs <- unlist(with(atbats, tapply(home_team_runs, INDEX = gameday_link, f)))
atbats$away_team_runs <- unlist(with(atbats, tapply(away_team_runs, INDEX = gameday_link, f)))
I am using pitchRx and scrape to look at some data related to what pitches a pitcher uses, given the score of the game. In order to do this, I am looking at the home_team_runs and away_team_runs columns in GameDay data, which pitchRx/scrape provides. However, I am encountering a lot of NA's in my data when the values are actually there, when I search 'home_team_runs' on gd2.mlb.com in the relevant xml file.
Here are my commands: library(dplyr) library(pitchRx) june8 <- scrape(start = "2014-06-08", end = "2014-06-08")
I was mostly interested in the WAS/SDN game, which returned all NA for Jordan Zimmermann; looking at different games on June 8 and also games on different days gives me for the most part the same results -- there are some sporadic entries (see screenshot attached, which are the results of doing a View(june8$atbat))
I am on a OS X 10.9.3, and using pitchRx version 1.5 on R Studio Version 0.98.501.
Happy to pass along any other info you need if I have forgotten anything -- many thanks!
Stuart