beanumber / airlines

An R package providing access to medium airline flight delay data
21 stars 36 forks source link

additional issue following intro vignette #33

Closed nicholasjhorton closed 8 years ago

nicholasjhorton commented 8 years ago

I ran the following:

devtools::install_github("beanumber/airlines")

library(airlines) library(RMySQL)

must have pre-existing database "airlines"

db <- src_mysql(host = "localhost", user = "r-user", password = "mypass", dbname = "airlines") airlines <- etl("airlines", db, dir = "~/dumps/airlines") airlines %>% etl_create(year = 1987, months = 10) lapply(2013, etl_update, obj = airlines)

flights <- tbl(airlines, "flights") flights %>% group_by(dest) %>% summarise(N=n())

Unfortunately, this yielded an error during the lapply call:

Reading flight data from /Users/nicholashorton/dumps/airlines/load/ Error: Cannot read file /Users/nicholashorton/dumps/airlines/load

Here's what I have in raw and load:

./load: 1987-10.csv 2013-11.csv 2013-3.csv 2013-6.csv 2013-9.csv 2013-1.csv 2013-12.csv 2013-4.csv 2013-7.csv 2013-10.csv 2013-2.csv 2013-5.csv 2013-8.csv

./raw: 1987-10.zip 2013-11.zip 2013-3.zip 2013-6.zip 2013-9.zip 2013-1.zip 2013-12.zip 2013-4.zip 2013-7.zip airports.dat 2013-10.zip 2013-2.zip 2013-5.zip 2013-8.zip carriers.csv

Can you please let me know if you are able to replicate this on one of your secondary machines?

beanumber commented 8 years ago

I can't even get there. I spent all morning on this

beanumber commented 8 years ago

I think your problem is here Can you debug to see what the value of csv is there?

nicholasjhorton commented 8 years ago

At the line:

message(paste("Reading flight data from", csv))

the value of csv is:

"/Users/nicholashorton/dumps/airlines/load/"

Note that the value of

"topush" is null coming out of "match_year_months()".

Could this be a problem there?

Nick

On Jan 7, 2016, at 11:08 AM, Ben Baumer notifications@github.com wrote:

I think your problem is here Can you debug to see what the value of csv is there?

— Reply to this email directly or view it on GitHub.

Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton

nicholasjhorton commented 8 years ago

Is the problem possibly here?

etl_load.etl_airlines <- function(obj, schema = FALSE, year = 2015, months = 1:12, ...) {

Shouldn't etl_load.etl_airlines default to "year = NULL"?

Nick

Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton

beanumber commented 8 years ago

Is it that the schema argument is now in front of year? Does this work:

lapply(2013, etl_update, obj = airlines, schema = FALSE)

? If so, should I change the order of the arguments?

nicholasjhorton commented 8 years ago

That now works: I'll defer to you about the best ordering of arguments.

Nick

On Jan 7, 2016, at 2:58 PM, Ben Baumer notifications@github.com wrote:

Is it that the schema argument is now in front of year? Does this work:

lapply(2013, etl_update, obj = airlines, schema = FALSE)

? If so, should I change the order of the arguments?

— Reply to this email directly or view it on GitHub.

Nicholas Horton Professor of Statistics Department of Mathematics and Statistics, Amherst College Box 2239, 31 Quadrangle Dr Amherst, MA 01002-5000 https://www.amherst.edu/people/facstaff/nhorton

beanumber commented 8 years ago

This is less than ideal, but I can't think of a way to fix it without either:

I'm not inclined to do either at the moment. But I have added this fix to the vignette.