Closed nicholasjhorton closed 8 years ago
I think your problem is here
Can you debug to see what the value of csv
is there?
At the line:
message(paste("Reading flight data from", csv))
the value of csv is:
"/Users/nicholashorton/dumps/airlines/load/"
Note that the value of
"topush" is null coming out of "match_year_months()".
Could this be a problem there?
Nick
On Jan 7, 2016, at 11:08 AM, Ben Baumer notifications@github.com wrote:
I think your problem is here Can you debug to see what the value of csv is there?
— Reply to this email directly or view it on GitHub.
Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton
Is the problem possibly here?
etl_load.etl_airlines <- function(obj, schema = FALSE, year = 2015, months = 1:12, ...) {
Shouldn't etl_load.etl_airlines default to "year = NULL"?
Nick
Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton
Is it that the schema
argument is now in front of year
?
Does this work:
lapply(2013, etl_update, obj = airlines, schema = FALSE)
? If so, should I change the order of the arguments?
That now works: I'll defer to you about the best ordering of arguments.
Nick
On Jan 7, 2016, at 2:58 PM, Ben Baumer notifications@github.com wrote:
Is it that the schema argument is now in front of year? Does this work:
lapply(2013, etl_update, obj = airlines, schema = FALSE)
? If so, should I change the order of the arguments?
— Reply to this email directly or view it on GitHub.
Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton
This is less than ideal, but I can't think of a way to fix it without either:
etl_update
etl_init
as a standalone functionI'm not inclined to do either at the moment. But I have added this fix to the vignette.
I ran the following:
devtools::install_github("beanumber/airlines")
library(airlines) library(RMySQL)
must have pre-existing database "airlines"
db <- src_mysql(host = "localhost", user = "r-user", password = "mypass", dbname = "airlines") airlines <- etl("airlines", db, dir = "~/dumps/airlines") airlines %>% etl_create(year = 1987, months = 10) lapply(2013, etl_update, obj = airlines)
flights <- tbl(airlines, "flights") flights %>% group_by(dest) %>% summarise(N=n())
Unfortunately, this yielded an error during the lapply call:
Reading flight data from /Users/nicholashorton/dumps/airlines/load/ Error: Cannot read file /Users/nicholashorton/dumps/airlines/load
Here's what I have in raw and load:
./load: 1987-10.csv 2013-11.csv 2013-3.csv 2013-6.csv 2013-9.csv 2013-1.csv 2013-12.csv 2013-4.csv 2013-7.csv 2013-10.csv 2013-2.csv 2013-5.csv 2013-8.csv
./raw: 1987-10.zip 2013-11.zip 2013-3.zip 2013-6.zip 2013-9.zip 2013-1.zip 2013-12.zip 2013-4.zip 2013-7.zip airports.dat 2013-10.zip 2013-2.zip 2013-5.zip 2013-8.zip carriers.csv
Can you please let me know if you are able to replicate this on one of your secondary machines?