cpsievert / pitchRx

Tools for scraping MLB Gameday data and Visualizing PITCHf/x
http://cpsievert.github.io/pitchRx/
Other
124 stars 33 forks source link

Scrape - column misformatting #34

Closed taylorgrant closed 9 years ago

taylorgrant commented 9 years ago

This is similar to Issue 26, but when scraping to an empty database with pitchRx (1.8), the "atbat" table loses its formatting when the scrape exceeds 200 games. The records for the first 200 games are formatted correctly, but the appended records shift by one or two cells.

The same thing occurs if you were to scrape less than 200 games to an empty database and then use the "update" function.

library(hadleyverse)
library(pitchRx)
library(RSQLite)

db <- dbConnect(SQLite(), dbname="pitchRx.sqlite")
scrape(start = "2015-09-29", end = "2015-09-29", connect = db$con)
source("update.R")

atbats <- tbl(db, 'atbat')
AB <- collect(atbats)

which(str_detect(AB$date, "^[A-Z]")[1]

The off-formatting begins on record 888. The original scrape pulled 887 files.

The "pitch" table also exhibits formatting issues, I'm not sure about the rest of the tables.

dlependorf commented 9 years ago

I'm getting something similar on the pitch table as well.

library(dplyr)
library(pitchRx)

db <- src_sqlite("./column_header_test.sqlite3",create=TRUE)
scrape(start="2015-01-01",end="2015-12-31",connect=db$con)

pitch <- db %>% tbl("pitch") %>% collect

The column headers start out fine, but if you look at tail(pitch), the columns are totally out of sync with the headers. Looks like the issues start on the 31570th row.

head(pitch)
tail(pitch)

problems <- pitch %>% slice(31560:31580)
cpsievert commented 9 years ago

Yikes! Thanks for bringing this back to my attention. I'm fairly certain 645b6a9 fixes this, but I haven't yet verified, could you reinstall (devtools::install_github("cpsievert/pitchRx")) and let me know if you still have problems?

dlependorf commented 9 years ago

Nope, all good here! Ran the code I posted above, and the column headers all line up. Thanks for taking a stab at this one.