beanumber / openWAR

An R package enabling the computation of openWAR using MLBAM data
99 stars 33 forks source link

getData — Error in bind_rows_(x, .id) : Column `balls` can't be converted from character to integer #111

Open ssp3nc3r opened 6 years ago

ssp3nc3r commented 6 years ago

I received this error when retrieving data for the 2012 regular season. When I change the line of code in getData(...) from,

out <- dplyr::bind_rows(ds.list)

to

out <- do.call(rbind, lapply(ds.list, data.frame, stringsAsFactors=FALSE))

the code completes without an error, though balls and strikes are of type character so they need to be converted to integer: out <- transform(out, balls = as.integer(balls), strikes = as.integer(strikes))

beanumber commented 6 years ago

Hmm...OK, thanks. Can you identify a particular day in 2012, or better yet, a particular game in which the problem arises? That would be helpful in debugging.

ssp3nc3r commented 6 years ago

No, sorry. I scraped the entire regular season at once. The bind_rows operates on the entire list at once, so it may take a lot of trial and error to find the offending game or games. I just re-ran the function, up to assignment to ds.list, so that I could have the data saved, and from that figured out the code I shared above which fixed the error.

znmeb commented 6 years ago

I'm getting an empty data frame from out <- dplyr::bind_rows(ds.list) now. I've forked the repo and I'm in the process of troubleshooting. gd.list looks OK but ds.list has nulls for all the data.

> library(openWAR)
> test <- getData("2018-03-24")

Retrieving data from 2018-03-24 ...

...found 16 games
[snip]
Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `gameId` is unknown
In addition: Warning message:
In .Internal(get(x, envir, mode, inherits)) :

It's crashing trying to drop suspended games, but ds.list is messed up before it gets there and the out data frame is empty. screenshot from 2018-03-25 13-08-55 screenshot from 2018-03-25 13-09-24

I think this line is wrong: ds.list <- lapply(gd.list, "[[", "ds") but I'm not sure what's supposed to be there.

james-ingold commented 6 years ago

@znmeb I was getting the same issue. It looks like MLB game day is including the day_xx portion of the url with the game ids now. I made the following pull request which fixes the issue https://github.com/beanumber/openWAR/pull/114