beanumber / airlines

An R package providing access to medium airline flight delay data
21 stars 36 forks source link

curl error on Windows #35

Closed beanumber closed 8 years ago

beanumber commented 8 years ago

Running the command to initialize the database causes an error:

airlines %>%
  etl_create(year = 1987, months = 10)
## Loading SQL script at C:/Users/Jonathan/Documents/R/win-library/3.2/airlines/sql/init.mysql
## Warning: 1608 parsing failures.

## row col  expected    actual

##   1  -- 2 columns 1 columns

##   3  -- 2 columns 1 columns

##   5  -- 2 columns 1 columns

##   7  -- 2 columns 1 columns

##   9  -- 2 columns 1 columns

## ... ... ......... .........

## .See problems(...) for more details.
## Warning: running command 'curl "https://raw.githubusercontent.com/jpatokal/

## openflights/master/data/airports.dat" -o "C:\Users\Jonathan\Desktop

## \Airlines/raw/airports.dat"' had status 127
## Warning in download.file(src, lcl, method = "curl"): download had nonzero

## exit status
## Error: 'C:\Users\Jonathan\Desktop\Airlines/raw/airports.dat' does not exist.

The initialization creates all 6 tables in the airlines database. The 6 tables are also given all of the columns (as seen in the Fields column in the output for ���DESCRIBE (tableName)���). The carriers table is populated with 1607 rows (as seen in the output for ���SELECT COUNT(*) FROM carriers���), and the other tables all have 0 rows. I���m not entirely sure how/where the initialization process gets stopped.

beanumber commented 8 years ago

The error occurs here (https://github.com/beanumber/airlines/blob/master/R/etl_load.R#L146) The description of what happens makes sense. The init script ran fine, which created all of the table definitions. The error occurred while populating the airports table, which leaves planes and weather empty.

Do you have RCurl installed? Or how does Windows use the method argument to download.file()?

jche commented 8 years ago

I did not have RCurl installed (or cURL, for that matter). I installed them both and tried again, but the same error occurred.

A quick Google search brought me here. The comments said to try just removing method = "curl", which on a Windows machine results in the default method = "internal" (according to the download.file documentation.

Honestly, I have absolutely no idea what the different methods mean, but when I tried removing method = "curl" from the download.file command in etl_load, the error disappeared. I still got the same warning about the 1608 parsing failures, but all of the flights/weather/etc. loaded into my airlines database as far as I can tell.

I'd be happy to try out other ideas that you might have as well to see how they work on Windows.

beanumber commented 8 years ago

See also #30 . The documentation also says:

Note that https:// URLs are not supported by the internal method.

and

curl (http://curl.haxx.se/) is installed on OS X and commonly on Unix-alikes. Windows binaries are available at that URL.

But even if that works, this seems silly. Windows users can't use method="curl" without installing an external dependency, but Mac users can't access HTTPS URLs with method="internal"?

@nicholasjhorton @jche Can you both execute:

> options("download.file.method")
$download.file.method
[1] "libcurl"

In any case, the solution may be to use RCurl instead.

nicholasjhorton commented 8 years ago

Here's what I get:

options("download.file.method") $download.file.method [1] "curl"

beanumber commented 8 years ago

I think this is fixed in the newest version of etl. @jche do you still get this error?

jche commented 8 years ago

@beanumber etl_create() works now. I still get the same parsing errors, but all the data gets imported and written to the database.