beanumber / airlines

An R package providing access to medium airline flight delay data
21 stars 36 forks source link

Unable to Load Airline Data using RSQLite #63

Open dhjessicajeong opened 4 years ago

dhjessicajeong commented 4 years ago

I'm trying to load the airlines data and am unable to load the full data -- I'm guessing the link in the etl_extract.R file is broken?

I've demonstrated below what the resulting airlines dataset looks like after loading it using the airlines and etl packages. It manages to successfully get the carrier of the flight, but gives no information on the actual flight (i.e., flight delays, departure and arrival times, cancellation status, etc.).

`reprex::reprex({ suppressPackageStartupMessages(library(mosaic)) suppressPackageStartupMessages(library(airlines)) suppressPackageStartupMessages(library(RSQLite)) airlines <- etl("airlines") airlines %>% etl_create(years = 2017, months = 12) %>% etl_cleanup()

class(airlines) summary(airlines) src_tbls(airlines) }) ` In the vignette for airlines, after running the etl_create() fucntion, the resulting database should include planes, airports, carriers, and flights. (Linked here).

However, the reprex example above only creates a database with one table, carrier. Looking into this a little closer, I noticed that the link called in the etl_extract.R function does not work. That being said, are there any suggestions on loading the data / troubleshooting for this error?

Thanks in advance for any guidance!

nicholasjhorton commented 4 years ago

When I try to run the reprex I get the same output (and a number of warnings and errors).

airlines <- etl("airlines") No database was specified so I created one for you at: /var/folders/1x/rkngly0d3lzczzmqdt5zvccm0000gn/T//RtmpmWb7FT/filec9945119ecce.sqlite3 airlines %>%

  • etl_create(years = 2017, months = 12) %>%
  • etl_cleanup() Could not find schema initialization script Parsed with column specification: cols( Code = col_character(), Description = col_character() ) Parsed with column specification: cols( X1 = col_double(), X2 = col_character(), X3 = col_character(), X4 = col_character(), X5 = col_character(), X6 = col_character(), X7 = col_double(), X8 = col_double(), X9 = col_double(), X10 = col_double(), X11 = col_character(), X12 = col_character(), X13 = col_character(), X14 = col_character() ) Warning: 353 parsing failures. row col expected actual file 6982 X10 a double \N '/private/var/folders/1x/rkngly0d3lzczzmqdt5zvccm0000gn/T/RtmpmWb7FT/raw/airports.dat' 6983 X10 a double \N '/private/var/folders/1x/rkngly0d3lzczzmqdt5zvccm0000gn/T/RtmpmWb7FT/raw/airports.dat' 6984 X10 a double \N '/private/var/folders/1x/rkngly0d3lzczzmqdt5zvccm0000gn/T/RtmpmWb7FT/raw/airports.dat' 6985 X10 a double \N '/private/var/folders/1x/rkngly0d3lzczzmqdt5zvccm0000gn/T/RtmpmWb7FT/raw/airports.dat' 6986 X10 a double \N '/private/var/folders/1x/rkngly0d3lzczzmqdt5zvccm0000gn/T/RtmpmWb7FT/raw/airports.dat' .... ... ........ ...... ...................................................................................... See problems(...) for more details.

Error: Columns 13, 14 cannot have NA as name

src_tbls(airlines) [1] "carriers"

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] RSQLite_2.2.0 airlines_0.2.2.9015 etl_0.3.8.9000 mosaic_1.5.0.9001 Matrix_1.2-18
[6] mosaicData_0.17.0 ggformula_0.9.3 ggstance_0.3.3 ggplot2_3.2.1 lattice_0.20-38
[11] dplyr_0.8.4

loaded via a namespace (and not attached): [1] nlme_3.1-144 fs_1.3.1 usethis_1.5.1 bit64_0.9-7 lubridate_1.7.4
[6] devtools_2.2.1 httr_1.4.1 rprojroot_1.3-2 tools_3.6.0 backports_1.1.5
[11] R6_2.4.1 DBI_1.1.0 lazyeval_0.2.2 colorspace_1.4-1 withr_2.1.2
[16] tidyselect_1.0.0 gridExtra_2.3 prettyunits_1.1.1 processx_3.4.1 leaflet_2.0.3
[21] bit_1.1-15.1 curl_4.3 compiler_3.6.0 cli_2.0.1 rvest_0.3.5
[26] xml2_1.2.2 desc_1.2.0 ggdendro_0.1-20 mosaicCore_0.6.0 scales_1.1.0
[31] readr_1.3.1 callr_3.4.2 stringr_1.4.0 digest_0.6.23 pkgconfig_2.0.3
[36] htmltools_0.4.0 sessioninfo_1.1.1 dbplyr_1.4.2 fastmap_1.0.1 htmlwidgets_1.5.1 [41] rlang_0.4.4 rstudioapi_0.11 shiny_1.4.0 farver_2.0.3 generics_0.0.2
[46] crosstalk_1.0.0 magrittr_1.5 Rcpp_1.0.3 munsell_0.5.0 fansi_0.4.1
[51] lifecycle_0.1.0 stringi_1.4.5 yaml_2.2.1 MASS_7.3-51.5 pkgbuild_1.0.6
[56] blob_1.2.1 grid_3.6.0 promises_1.1.0 ggrepel_0.8.1 crayon_1.3.4
[61] splines_3.6.0 hms_0.5.3 ps_1.3.0 pillar_1.4.3 pkgload_1.0.2
[66] glue_1.3.1.9000 downloader_0.4 remotes_2.1.0 vctrs_0.2.2 tweenr_1.0.1
[71] httpuv_1.5.2 testthat_2.3.1 gtable_0.3.0 purrr_0.3.3 polyclip_1.10-0
[76] tidyr_1.0.2 assertthat_0.2.1 ggforce_0.3.1 mime_0.9 xtable_1.8-4
[81] broom_0.5.4 later_1.0.0 tibble_2.1.3 memoise_1.1.0 ellipsis_0.3.0

mrouhana22 commented 3 years ago

Are there any updates on this front? I am also unable to load the full data as the link has expired.

nicholasjhorton commented 3 years ago

@beanumber do you know if there is an alternative source for the flight data? Or is the package no longer effectively usable?

nicholasjhorton commented 2 years ago

Is this the same link that needs updating (as for nycflights13)?

https://github.com/tidyverse/nycflights13/pull/50