beanumber / airlines

An R package providing access to medium airline flight delay data
21 stars 36 forks source link

2015 missing data? #37

Closed beanumber closed 8 years ago

beanumber commented 8 years ago
airlines%<>%
  etl_update(year = 2013) %>%
  etl_update(year = 2014) %>%
  etl_update(year = 2015)

2015 only has data up to July. When the process tried to load the .zip file from the transtats website for August, it failed and terminated immediately. This left the process with a bunch of .csv files in the right place (in my Airlines/load folder), but not loaded into the database.

beanumber commented 8 years ago

Can you post:

str(airlines)

and the contents of the directory load and raw directories?

beanumber commented 8 years ago

Nevermind. The data does not appear to be there after July 2015.

Perhaps the etl_extract() function could become smart enough to notice this error.

nicholasjhorton commented 8 years ago

The BTS website indicates that data are available through the end of October, 2015. Figuring out how etl_extract() can deal with the edge case moving forward would seem to be important. I'd be happy to assist.

beanumber commented 8 years ago

The 2016 data is also here, but not at the same URL as the data before August 2015.

beanumber commented 8 years ago

Found it. It changes to this:

http://tsdata.bts.gov/PREZIP/On_Time_On_Time_Performance_2015_8.zip

It appears to be backwards-compatible, so I'll just change the URL.