beanumber / airlines

An R package providing access to medium airline flight delay data
21 stars 36 forks source link

Airlines database load seems to not work. #60

Open schoend opened 5 years ago

schoend commented 5 years ago

mysql load of database seems not to work; partial output from attempt ends with:

6916 X10 a double \N '/home/schoend/dumps/airlines/raw/airports.dat' .... ... ........ ...... ...............................................

See problems(...) for more details.

Error: Columns 13, 14 cannot have NA as name

Do not know where to go next. More detail, attached in .docx file shows that init.mysql file creates database with four tables; carriers is populated but not the other three.

Dale Schoenefeld

DatabaseLoadProblemWithAirlines.docx

beanumber commented 5 years ago
library(airlines)
#> Loading required package: etl
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
db <- src_mysql_cnf(dbname = "airlines")
ontime <- etl("airlines", db = db, dir = "~/dumps/airlines")
etl_init(ontime)
#> Running SQL script at /home/bbaumer/R/x86_64-pc-linux-gnu-library/3.4/airlines/sql/init.mysql
#> Parsed with column specification:
#> cols(
#>   Code = col_character(),
#>   Description = col_character()
#> )
#> Parsed with column specification:
#> cols(
#>   X1 = col_double(),
#>   X2 = col_character(),
#>   X3 = col_character(),
#>   X4 = col_character(),
#>   X5 = col_character(),
#>   X6 = col_character(),
#>   X7 = col_double(),
#>   X8 = col_double(),
#>   X9 = col_double(),
#>   X10 = col_double(),
#>   X11 = col_character(),
#>   X12 = col_character()
#> )

Created on 2019-03-14 by the reprex package (v0.2.1)

Please make sure that you are using the most recent version of airlines. Your call to etl_init() seems to expect 14 variables in airports.dat, but mine only expects 12.

beanumber commented 5 years ago

@schoend you might also want to check /home/schoend/dumps/airlines/raw/airports.dat to make sure it has the right number of rows and columns. Here is what I get:

bbaumer@mini-meerkat:~$ wc ~/dumps/airlines/raw/airports.dat 
  8107  22136 850313 /home/bbaumer/dumps/airlines/raw/airports.dat
bbaumer@mini-meerkat:~$ head -n 1 ~/dumps/airlines/raw/airports.dat 
1,"Goroka","Goroka","Papua New Guinea","GKA","AYGA",-6.081689,145.391881,5282,10,"U","Pacific/Port_Moresby"
Deleetdk commented 4 years ago

@beanumber I am guessing you have an old file cached. The new file that is automatically downloaded has 14 columns, not 12. It also results in a bunch of parsing errors if one loads it with read_csv() manually.

airports.dat.zip

> airports_tmp = read_csv("~/dumps/airlines/raw/airports.dat")

── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  `1` = col_double(),
  `Goroka Airport` = col_character(),
  Goroka = col_character(),
  `Papua New Guinea` = col_character(),
  GKA = col_character(),
  AYGA = col_character(),
  `-6.081689834590001` = col_double(),
  `145.391998291` = col_double(),
  `5282` = col_double(),
  `10` = col_double(),
  U = col_character(),
  `Pacific/Port_Moresby` = col_character(),
  airport = col_character(),
  OurAirports = col_character()
)

Warning: 353 parsing failures.
 row col expected actual                                file
6981  10 a double    \N '~/dumps/airlines/raw/airports.dat'
6982  10 a double    \N '~/dumps/airlines/raw/airports.dat'
6983  10 a double    \N '~/dumps/airlines/raw/airports.dat'
6984  10 a double    \N '~/dumps/airlines/raw/airports.dat'
6985  10 a double    \N '~/dumps/airlines/raw/airports.dat'
.... ... ........ ...... ...................................
See problems(...) for more details.
> dim(airports_tmp)
[1] 7697   14