Closed aaronbaggett closed 10 years ago
Before I suggest anything, would you mind showing me what output you get from the following?
library(DBI)
gidz <- unique(dbGetQuery(pfx_db$con, "SELECT DISTINCT gameday_link FROM player")[,1])
head(gidz)
tail(gidz)
gidz2 <- unique(dbGetQuery(pfx_db$con, "SELECT DISTINCT gameday_link FROM game")[,1])
head(gidz2)
tail(gidz2)
Also, the action, atbat, pitch, po, and runner tables are missing because 'inning/inning_all.xml' is not included in files
. You can easily add them to what you have so far by doing:
scrape(start = "2009-01-01", end = "2014-01-01", connect = pfx_db$con)
Thanks. Yeah, I realized I was missing inning_all.xml
after I submitted earlier. I should say that I updated my pfx_db
and started over with all games played on Sunday just to try and troubleshoot things. Here is what I ran in order to build the test db. There did not appear to be any errors from what I ran below.
pfx_db <- src_sqlite("pitchRx.sqlite3", create = TRUE)
pfx_db
files <- c("inning/inning_all.xml", "players.xml")
scrape(start = "2014-07-20", end = "2014-07-20", suffix = files, connect = pfx_db$con)
Here's my output from your recommendation:
gidz <- unique(dbGetQuery(pfx_db$con, "SELECT DISTINCT gameday_link FROM player")[,1])
head(gidz)
[1] "gid_2014_07_20_cinmlb_nyamlb_1" "gid_2014_07_20_texmlb_tormlb_1"
[3] "gid_2014_07_20_clemlb_detmlb_1" "gid_2014_07_20_sfnmlb_miamlb_1"
[5] "gid_2014_07_20_colmlb_pitmlb_1" "gid_2014_07_20_kcamlb_bosmlb_1"
tail(gidz)
[1] "gid_2014_07_20_tbamlb_minmlb_1" "gid_2014_07_20_seamlb_anamlb_1"
[3] "gid_2014_07_20_balmlb_oakmlb_1" "gid_2014_07_20_chnmlb_arimlb_1"
[5] "gid_2014_07_20_nynmlb_sdnmlb_1" "gid_2014_07_20_lanmlb_slnmlb_1"
gidz2 <- unique(dbGetQuery(pfx_db$con, "SELECT DISTINCT gameday_link FROM game")[,1])
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: no such table: game)
Looks like gidz2
resulted in an error.
Thanks for your help!
I was hoping you'd run that code on the database that you had in your initial report (you shouldn't have to create more than one database). In your initial report, your database had coach, game, media, player, umpire
tables, which indicates that the miniscoreboard.xml
and players.xml
files were correctly parsed and added to your database before the error occurred.
I was able to run your original snippet code successfully using pitchRx 1.5
library(dplyr)
library(pitchRx)
pfx_db <- src_sqlite("pitchRx.sqlite3", create = TRUE)
files <- c("inning/inning_hit.xml", "miniscoreboard.xml", "players.xml")
scrape(start = "2009-01-01", end = "2014-01-01", suffix = files, connect = pfx_db$con)
It could be that your internet connection became unstable at some point. For this (and other) reasons, I usually suggest to not scrape more than 1 year's worth of data at a time. In other words,
scrape(start = "2009-01-01", end = "2010-01-01", suffix = files, connect = pfx_db$con)
scrape(start = "2010-01-01", end = "2011-01-01", suffix = files, connect = pfx_db$con)
# and so on
Thanks, Carson. See you soon at JSM.
Hey Carson, I'm getting a couple of strange error messages when trying to scrape some data. Here's the code I'm using. My session info is also below.
R Code:
Error Message:
After a fresh session, I got the following message:
As you can see, it does appear to successfully copy some of the tables to my
pfx_db
. However,action, atbat, pitch, po, and runner
appear to be missing for some reason. However, when I check the tbls, I get the following:Session Info: