business-science / tidyquant

Bringing financial analysis to the tidyverse
https://business-science.github.io/tidyquant/
Other
852 stars 175 forks source link

tq_exchange: NASDAQ, AMEX, NYSE returns NA In download.file(url, destfile = tmp, quiet = TRUE) : InternetOpenUrl failed: 'The operation timed out' #177

Closed RQuestion closed 4 years ago

RQuestion commented 4 years ago

Hey, unable to get tq_exchange to work anymore, get the following error. I updated my libraries and still see this issue.

AMEX <- tq_exchange("AMEX") Getting data...

Warning messages: 1: In download.file(url, destfile = tmp, quiet = TRUE) : InternetOpenUrl failed: 'The operation timed out' 2: In value[3L] : Error at amex during call to tq_exchange.

mdancho84 commented 4 years ago

Looks like NASDAQ (the company) may have changed its list API. Give it a day or so and let's see if it comes back. If not, may need to search for a new data source.

mdancho84 commented 4 years ago

It should download. I was able to get this URL to download. Not sure where it's getting hung up. https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download

RQuestion commented 4 years ago

Yeah - it seems like everything should work. The problem has been intermittent, starting over the weekend. There will be times throughout the day where it will pull data, and other times it returns NA.

mdancho84 commented 4 years ago

I checked this morning, and it seems like it's working. I have a feeling NASDAQ is playing around with their website. Hopefully the problem is fixed on their end soon.

sargac commented 4 years ago

Hi, I also tried several times today and yesterday, and I didn't work. The direct link you provided worked though (thanks!). Hopefully they'll restore the access.

tidyquant::tq_exchange("AMEX")
Getting data...

Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
  InternetOpenUrl failed: 'The operation timed out'
2: In value[[3L]](cond) : Error at amex during call to tq_exchange.

tidyquant::tq_exchange("NYSE")
Getting data...

Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
  InternetOpenUrl failed: 'The operation timed out'
2: In value[[3L]](cond) : Error at nyse during call to tq_exchange.
chriscardillo commented 4 years ago

Hi! Having the same issue, slightly different error message:

Running tq_exchange("NASDAQ")

Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
  cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nasdaq&render=download': HTTP status was '403 Forbidden'
2: In value[[3L]](cond) : Error at nasdaq during call to tq_exchange.
elohbeck commented 4 years ago

It seems like it isn't just NASDAQ. I'm seeing the same issue with AMEX and NYSE as well. I started seeing the problem on the morning of July 22nd.

image

mdancho84 commented 4 years ago

@elohbeck These all come from the same data source.

https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download

mdancho84 commented 4 years ago

The issue seems to be happening with download.file().

download.file("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download", "nasdaq.csv")
#> Warning in download.file("https://old.nasdaq.com/screening/companies-by-
#> name.aspx?letter=0&exchange=NASDAQ&render=download", : InternetOpenUrl failed:
#> 'The operation timed out'
#> Error in download.file("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download", : cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download'

Created on 2020-07-25 by the reprex package (v0.3.0)

elohbeck commented 4 years ago

@mdancho84 ahh, I see. Thanks for clarifying. When I download the file, the file name is companylist.csv. Is the "nasdaq.csv" parameter the rename of the file?

mdancho84 commented 4 years ago

@elohbeck The name of the file "nasdaq.csv" is what the file would be named if download.file() worked properly. When you download the file directly from the website, the default name is companylist.csv.

mdancho84 commented 4 years ago

In researching it looks like the "Old Nasdaq" website has changed to a JavaScript-based website. This is to prevent webscraping, which is what tidyquant does.

Further changes to the site appear to be forthcoming with a New NASDAQ site.

For the time being, just go to the old site and download directly. https://old.nasdaq.com/screening/company-list.aspx

I don't see a solution to this unless someone knows how to download from JavaScript websites.

elohbeck commented 4 years ago

@mdancho84 thanks again.

Would the browseURL() method work in place of download.file()?

https://stackoverflow.com/questions/24856884/download-file-with-r-given-a-javascript-statement/54808772#54808772

chriscardillo commented 4 years ago

@mdancho84 - FWIW this works with readr:

read_csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download")

There would just be a need to clean the column names, maybe with janitor::clean_names

mdancho84 commented 4 years ago

@chriscardillo I couldn't get to work:

read_csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download")

I believe it uses curl which is what download.file() uses.

chriscardillo commented 4 years ago

@mdancho84 PR issued đź‘Ť

mdancho84 commented 4 years ago

Hey @chriscardillo I have to double-check. The read_csv approach wasn’t working for me. Will take another look tonight.

elohbeck commented 4 years ago

Hey @mdancho84, any luck getting this to work?

chriscardillo commented 4 years ago

Just saw this email - friendly reminder this PR is open :)

https://github.com/business-science/tidyquant/pull/178

mdancho84 commented 4 years ago

@chriscardillo the read_csv() approach only works if download.file() works. So I tried today an tq_exchange("NASDAQ") works as-is, no changes needed.

Will this change in the future? Probably - It's totally dependent on NASDAQ and its website.


> tidyquant::tq_exchange("NASDAQ")
Getting data...

# A tibble: 3,653 x 7
   symbol company                        last.sale.price market.cap ipo.year sector         industry                            
   <chr>  <chr>                                    <dbl> <chr>         <dbl> <chr>          <chr>                               
 1 TXG    10x Genomics, Inc.                       99.0  $9.74B         2019 Capital Goods  Biotechnology: Laboratory Analytica…
 2 YI     111, Inc.                                 6.31 $519.69M       2018 Health Care    Medical/Nursing Services            
 3 PIH    1347 Property Insurance Holdi…            4.48 $27.19M        2014 Finance        Property-Casualty Insurers          
 4 PIHPP  1347 Property Insurance Holdi…           24    NA               NA Finance        Property-Casualty Insurers          
 5 TURN   180 Degree Capital Corp.                  1.73 $53.84M          NA Finance        Finance/Investors Services          
 6 FLWS   1-800-FLOWERS.COM, Inc.                  29.5  $1.9B          1999 Consumer Serv… Other Specialty Stores              
 7 BCOW   1895 Bancorp of Wisconsin, In…            9.3  $46.13M        2019 Finance        Banks                               
 8 ONEM   1Life Healthcare, Inc.                   30.0  $3.79B         2020 Health Care    Medical/Nursing Services            
 9 FCCY   1st Constitution Bancorp (NJ)            12.2  $124.98M         NA Finance        Savings Institutions                
10 SRCE   1st Source Corporation                   33.0  $841.88M         NA Finance        Major Banks                         
# … with 3,643 more rows``
chriscardillo commented 4 years ago

readr::read_csv() does not use download.file() at all (see readr source code here)

Additionally, I am using tidyquant 1.0.1 on Mac OS 10.15 and still receive the same error:

> tq_exchange("NASDAQ")
Getting data...

[1] NA
Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
  cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nasdaq&render=download': HTTP status was '403 Forbidden'
2: In value[[3L]](cond) : Error at nasdaq during call to tq_exchange.

When I try with the tq_exchange function that utilizes read_csv, I have no issues.

> tq_exchange("NASDAQ")
Getting data...

# A tibble: 3,653 x 7
   symbol company                       last.sale.price market.cap ipo.year sector       industry           
   <chr>  <chr>                                   <dbl> <chr>         <dbl> <chr>        <chr>              
 1 AACG   ATA Creativity Global                    1.36 $43.32M          NA Consumer Se… Other Consumer Ser…
 2 AACQU  Artius Acquisition Inc.                 10.0  NA             2020 Finance      Business Services  
 3 AAL    American Airlines Group, Inc.           11.1  $5.63B           NA Transportat… Air Freight/Delive…
 4 AAME   Atlantic American Corporation            1.88 $38.42M          NA Finance      Life Insurance     
 5 AAOI   Applied Optoelectronics, Inc.           16.1  $326.65M       2013 Technology   Semiconductors     
 6 AAON   AAON, Inc.                              60.3  $3.14B           NA Capital Goo… Industrial Machine…
 7 AAPL   Apple Inc.                             436.   $1863.11B      1980 Technology   Computer Manufactu…
 8 AAWW   Atlas Air Worldwide Holdings            53.3  $1.39B           NA Transportat… Transportation Ser…
 9 AAXJ   iShares MSCI All Country Asi…           75.4  NA               NA NA           NA                 
10 AAXN   Axon Enterprise, Inc.                   88.5  $5.6B            NA Capital Goo… Ordnance And Acces…
# … with 3,643 more rows

Does the read_csv implementation of tq_exchange work for you?

mdancho84 commented 4 years ago

@chriscardillo I'm back on this issue. I did a quick check, and the read_csv() approach is still not working for me.

> read_csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download")
Error in open.connection(con, "rb") : Send failure: Connection was reset

I tested on my Windows machine and also on an instance of RStudio Cloud (Linux). The RStudio Cloud did not return results after several minutes.

So I don't think this is a viable solution even though it seems to work for you.

mdancho84 commented 4 years ago

I also see this on Nasdaq's website: image

chriscardillo commented 4 years ago

Alright. I will close the PR.

mdancho84 commented 4 years ago

If we can get a javascript scraper like PhantomJS integrated, we can probably download the files. The problem is once this "old" site goes away, I don't believe the site will give away the excel files.

Alternatively, I can easily provide the NASDAQ 100 - The Invesco Powershares QQQ ETF that follows this list.

mdancho84 commented 4 years ago

I think we may have something... An email from a friend indicates that there might be an issues with the User Agent being used.

Hi Matt,

It looks like the user_agent set in R version 4.0.2 (2020-06-22) -- "Taking Off Again” is causing a problem in download.file().

Default for R version 4.0.2 (2020-06-22) -- "Taking Off Again” is "R (4.0.2 x86_64-apple-darwin17.0 x86_64 darwin17.0)”

Setting it to "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0” will make tq_exchange() work.

-Me

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

-CMDs
> getOption("HTTPUserAgent")
[1] "R (4.0.2 x86_64-apple-darwin17.0 x86_64 darwin17.0)"

> options(HTTPUserAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0")
> getOption("HTTPUserAgent")
[1] "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0"
> tq_exchange("AMEX")
Getting data...

# A tibble: 293 x 7
   symbol company                                            last.sale.price market.cap ipo.year sector                industry                       
   <chr>  <chr>                                                        <dbl> <chr>         <dbl> <chr>                 <chr>                          
 1 GOED   1847 Goedeker Inc.                                            7.56 $52.71M        2020 Consumer Services     Home Furnishings               
 2 XXII   22nd Century Group, Inc                                       0.68 $94.42M          NA Consumer Non-Durables Farming/Seeds/Milling          
 3 FAX    Aberdeen Asia-Pacific Income Fund Inc                         4.15 $1.03B         1986 <NA>                  <NA>                           
 4 IAF    Aberdeen Australia Equity Fund Inc                            4.74 $107.8M          NA <NA>                  <NA>                           
 5 AEF    Aberdeen Emerging Markets Equity Income Fund, Inc.            6.71 $340.54M         NA <NA>                  <NA>                           
 6 FCO    Aberdeen Global Income Fund, Inc.                             7.29 $63.6M         1992 <NA>                  <NA>                           
 7 ACU    Acme United Corporation.                                     22.8  $75.9M         1988 Capital Goods         Industrial Machinery/Components
 8 ATNM   Actinium Pharmaceuticals, Inc.                                9.91 $4.49M           NA Health Care           Major Pharmaceuticals          
 9 AE     Adams Resources & Energy, Inc.                               22.2  $94.18M          NA Energy                Oil Refining/Marketing         
10 ACY    AeroCentury Corp.                                             3.05 $4.71M           NA Technology            Diversified Commercial Services
# … with 283 more rows
> 
mdancho84 commented 4 years ago

Also, I was able to successfully get this to work in the terminal:

curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0"  "https://old.nasdaq.com/screening/companies-by-name.asp
x?letter=0&exchange=nyse&render=download"
mdancho84 commented 4 years ago

@chriscardillo This is looking promissing:

library(curl)
library(readr)
#> 
#> Attaching package: 'readr'
#> The following object is masked from 'package:curl':
#> 
#>     parse_date

handle <- new_handle(verbose = TRUE) 
handle_setopt(handle, useragent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0")

con <- curl("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download", handle = handle)

read_csv(con)
#> Warning: Missing column names filled in: 'X9' [9]
#> Parsed with column specification:
#> cols(
#>   Symbol = col_character(),
#>   Name = col_character(),
#>   LastSale = col_character(),
#>   MarketCap = col_character(),
#>   IPOyear = col_character(),
#>   Sector = col_character(),
#>   industry = col_character(),
#>   `Summary Quote` = col_character(),
#>   X9 = col_logical()
#> )
#> # A tibble: 3,130 x 9
#>    Symbol Name  LastSale MarketCap IPOyear Sector industry `Summary Quote` X9   
#>    <chr>  <chr> <chr>    <chr>     <chr>   <chr>  <chr>    <chr>           <lgl>
#>  1 DDD    3D S~ 6.1      $738.92M  n/a     Techn~ Compute~ https://old.na~ NA   
#>  2 MMM    3M C~ 165.86   $95.54B   n/a     Healt~ Medical~ https://old.na~ NA   
#>  3 WBAI   500.~ 3.76     $161.68M  2013    Consu~ Service~ https://old.na~ NA   
#>  4 WUBA   58.c~ 55.84    $8.37B    2013    Techn~ Compute~ https://old.na~ NA   
#>  5 EGHT   8x8 ~ 16.2     $1.69B    n/a     Techn~ EDP Ser~ https://old.na~ NA   
#>  6 AHC    A.H.~ 1.69     $40.36M   n/a     Consu~ Newspap~ https://old.na~ NA   
#>  7 AOS    A.O ~ 49.06    $7.92B    n/a     Consu~ Consume~ https://old.na~ NA   
#>  8 ATEN   A10 ~ 8.58     $668.63M  2014    Techn~ Compute~ https://old.na~ NA   
#>  9 AIR    AAR ~ 19.54    $686.94M  n/a     Capit~ Aerospa~ https://old.na~ NA   
#> 10 AAN    Aaro~ 58.41    $3.92B    n/a     Techn~ Diversi~ https://old.na~ NA   
#> # ... with 3,120 more rows

Created on 2020-08-14 by the reprex package (v0.3.0)

mdancho84 commented 4 years ago

The dev version should fix this for now.

devtools::install_github("business-science/tidyquant")

This fix will be incorporated in the next CRAN release.