Closed RQuestion closed 4 years ago
Looks like NASDAQ (the company) may have changed its list API. Give it a day or so and let's see if it comes back. If not, may need to search for a new data source.
It should download. I was able to get this URL to download. Not sure where it's getting hung up. https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download
Yeah - it seems like everything should work. The problem has been intermittent, starting over the weekend. There will be times throughout the day where it will pull data, and other times it returns NA.
I checked this morning, and it seems like it's working. I have a feeling NASDAQ is playing around with their website. Hopefully the problem is fixed on their end soon.
Hi, I also tried several times today and yesterday, and I didn't work. The direct link you provided worked though (thanks!). Hopefully they'll restore the access.
tidyquant::tq_exchange("AMEX")
Getting data...
Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
InternetOpenUrl failed: 'The operation timed out'
2: In value[[3L]](cond) : Error at amex during call to tq_exchange.
tidyquant::tq_exchange("NYSE")
Getting data...
Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
InternetOpenUrl failed: 'The operation timed out'
2: In value[[3L]](cond) : Error at nyse during call to tq_exchange.
Hi! Having the same issue, slightly different error message:
Running tq_exchange("NASDAQ")
Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nasdaq&render=download': HTTP status was '403 Forbidden'
2: In value[[3L]](cond) : Error at nasdaq during call to tq_exchange.
It seems like it isn't just NASDAQ. I'm seeing the same issue with AMEX and NYSE as well. I started seeing the problem on the morning of July 22nd.
@elohbeck These all come from the same data source.
https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download
The issue seems to be happening with download.file()
.
download.file("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download", "nasdaq.csv")
#> Warning in download.file("https://old.nasdaq.com/screening/companies-by-
#> name.aspx?letter=0&exchange=NASDAQ&render=download", : InternetOpenUrl failed:
#> 'The operation timed out'
#> Error in download.file("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download", : cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NASDAQ&render=download'
Created on 2020-07-25 by the reprex package (v0.3.0)
@mdancho84 ahh, I see. Thanks for clarifying. When I download the file, the file name is companylist.csv. Is the "nasdaq.csv" parameter the rename of the file?
@elohbeck The name of the file "nasdaq.csv" is what the file would be named if download.file()
worked properly. When you download the file directly from the website, the default name is companylist.csv.
In researching it looks like the "Old Nasdaq" website has changed to a JavaScript-based website. This is to prevent webscraping, which is what tidyquant
does.
Further changes to the site appear to be forthcoming with a New NASDAQ site.
For the time being, just go to the old site and download directly. https://old.nasdaq.com/screening/company-list.aspx
I don't see a solution to this unless someone knows how to download from JavaScript websites.
@mdancho84 thanks again.
Would the browseURL() method work in place of download.file()?
@mdancho84 - FWIW this works with readr
:
read_csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download")
There would just be a need to clean the column names, maybe with janitor::clean_names
@chriscardillo I couldn't get to work:
read_csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download")
I believe it uses curl
which is what download.file()
uses.
@mdancho84 PR issued đź‘Ť
Hey @chriscardillo I have to double-check. The read_csv approach wasn’t working for me. Will take another look tonight.
Hey @mdancho84, any luck getting this to work?
Just saw this email - friendly reminder this PR is open :)
@chriscardillo the read_csv()
approach only works if download.file()
works. So I tried today an tq_exchange("NASDAQ")
works as-is, no changes needed.
Will this change in the future? Probably - It's totally dependent on NASDAQ and its website.
> tidyquant::tq_exchange("NASDAQ")
Getting data...
# A tibble: 3,653 x 7
symbol company last.sale.price market.cap ipo.year sector industry
<chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 TXG 10x Genomics, Inc. 99.0 $9.74B 2019 Capital Goods Biotechnology: Laboratory Analytica…
2 YI 111, Inc. 6.31 $519.69M 2018 Health Care Medical/Nursing Services
3 PIH 1347 Property Insurance Holdi… 4.48 $27.19M 2014 Finance Property-Casualty Insurers
4 PIHPP 1347 Property Insurance Holdi… 24 NA NA Finance Property-Casualty Insurers
5 TURN 180 Degree Capital Corp. 1.73 $53.84M NA Finance Finance/Investors Services
6 FLWS 1-800-FLOWERS.COM, Inc. 29.5 $1.9B 1999 Consumer Serv… Other Specialty Stores
7 BCOW 1895 Bancorp of Wisconsin, In… 9.3 $46.13M 2019 Finance Banks
8 ONEM 1Life Healthcare, Inc. 30.0 $3.79B 2020 Health Care Medical/Nursing Services
9 FCCY 1st Constitution Bancorp (NJ) 12.2 $124.98M NA Finance Savings Institutions
10 SRCE 1st Source Corporation 33.0 $841.88M NA Finance Major Banks
# … with 3,643 more rows``
readr::read_csv()
does not use download.file()
at all (see readr source code here)
Additionally, I am using tidyquant 1.0.1
on Mac OS 10.15 and still receive the same error:
> tq_exchange("NASDAQ")
Getting data...
[1] NA
Warning messages:
1: In download.file(url, destfile = tmp, quiet = TRUE) :
cannot open URL 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nasdaq&render=download': HTTP status was '403 Forbidden'
2: In value[[3L]](cond) : Error at nasdaq during call to tq_exchange.
When I try with the tq_exchange
function that utilizes read_csv
, I have no issues.
> tq_exchange("NASDAQ")
Getting data...
# A tibble: 3,653 x 7
symbol company last.sale.price market.cap ipo.year sector industry
<chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 AACG ATA Creativity Global 1.36 $43.32M NA Consumer Se… Other Consumer Ser…
2 AACQU Artius Acquisition Inc. 10.0 NA 2020 Finance Business Services
3 AAL American Airlines Group, Inc. 11.1 $5.63B NA Transportat… Air Freight/Delive…
4 AAME Atlantic American Corporation 1.88 $38.42M NA Finance Life Insurance
5 AAOI Applied Optoelectronics, Inc. 16.1 $326.65M 2013 Technology Semiconductors
6 AAON AAON, Inc. 60.3 $3.14B NA Capital Goo… Industrial Machine…
7 AAPL Apple Inc. 436. $1863.11B 1980 Technology Computer Manufactu…
8 AAWW Atlas Air Worldwide Holdings 53.3 $1.39B NA Transportat… Transportation Ser…
9 AAXJ iShares MSCI All Country Asi… 75.4 NA NA NA NA
10 AAXN Axon Enterprise, Inc. 88.5 $5.6B NA Capital Goo… Ordnance And Acces…
# … with 3,643 more rows
Does the read_csv
implementation of tq_exchange
work for you?
@chriscardillo I'm back on this issue. I did a quick check, and the read_csv()
approach is still not working for me.
> read_csv("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=NYSE&render=download")
Error in open.connection(con, "rb") : Send failure: Connection was reset
I tested on my Windows machine and also on an instance of RStudio Cloud (Linux). The RStudio Cloud did not return results after several minutes.
So I don't think this is a viable solution even though it seems to work for you.
I also see this on Nasdaq's website:
Alright. I will close the PR.
If we can get a javascript scraper like PhantomJS integrated, we can probably download the files. The problem is once this "old" site goes away, I don't believe the site will give away the excel files.
Alternatively, I can easily provide the NASDAQ 100 - The Invesco Powershares QQQ ETF that follows this list.
I think we may have something... An email from a friend indicates that there might be an issues with the User Agent being used.
Hi Matt,
It looks like the user_agent set in R version 4.0.2 (2020-06-22) -- "Taking Off Again” is causing a problem in download.file().
Default for R version 4.0.2 (2020-06-22) -- "Taking Off Again” is "R (4.0.2 x86_64-apple-darwin17.0 x86_64 darwin17.0)”
Setting it to "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0” will make tq_exchange() work.
-Me
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
-CMDs
> getOption("HTTPUserAgent")
[1] "R (4.0.2 x86_64-apple-darwin17.0 x86_64 darwin17.0)"
> options(HTTPUserAgent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0")
> getOption("HTTPUserAgent")
[1] "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0"
> tq_exchange("AMEX")
Getting data...
# A tibble: 293 x 7
symbol company last.sale.price market.cap ipo.year sector industry
<chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 GOED 1847 Goedeker Inc. 7.56 $52.71M 2020 Consumer Services Home Furnishings
2 XXII 22nd Century Group, Inc 0.68 $94.42M NA Consumer Non-Durables Farming/Seeds/Milling
3 FAX Aberdeen Asia-Pacific Income Fund Inc 4.15 $1.03B 1986 <NA> <NA>
4 IAF Aberdeen Australia Equity Fund Inc 4.74 $107.8M NA <NA> <NA>
5 AEF Aberdeen Emerging Markets Equity Income Fund, Inc. 6.71 $340.54M NA <NA> <NA>
6 FCO Aberdeen Global Income Fund, Inc. 7.29 $63.6M 1992 <NA> <NA>
7 ACU Acme United Corporation. 22.8 $75.9M 1988 Capital Goods Industrial Machinery/Components
8 ATNM Actinium Pharmaceuticals, Inc. 9.91 $4.49M NA Health Care Major Pharmaceuticals
9 AE Adams Resources & Energy, Inc. 22.2 $94.18M NA Energy Oil Refining/Marketing
10 ACY AeroCentury Corp. 3.05 $4.71M NA Technology Diversified Commercial Services
# … with 283 more rows
>
Also, I was able to successfully get this to work in the terminal:
curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0" "https://old.nasdaq.com/screening/companies-by-name.asp
x?letter=0&exchange=nyse&render=download"
@chriscardillo This is looking promissing:
library(curl)
library(readr)
#>
#> Attaching package: 'readr'
#> The following object is masked from 'package:curl':
#>
#> parse_date
handle <- new_handle(verbose = TRUE)
handle_setopt(handle, useragent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0")
con <- curl("https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download", handle = handle)
read_csv(con)
#> Warning: Missing column names filled in: 'X9' [9]
#> Parsed with column specification:
#> cols(
#> Symbol = col_character(),
#> Name = col_character(),
#> LastSale = col_character(),
#> MarketCap = col_character(),
#> IPOyear = col_character(),
#> Sector = col_character(),
#> industry = col_character(),
#> `Summary Quote` = col_character(),
#> X9 = col_logical()
#> )
#> # A tibble: 3,130 x 9
#> Symbol Name LastSale MarketCap IPOyear Sector industry `Summary Quote` X9
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 DDD 3D S~ 6.1 $738.92M n/a Techn~ Compute~ https://old.na~ NA
#> 2 MMM 3M C~ 165.86 $95.54B n/a Healt~ Medical~ https://old.na~ NA
#> 3 WBAI 500.~ 3.76 $161.68M 2013 Consu~ Service~ https://old.na~ NA
#> 4 WUBA 58.c~ 55.84 $8.37B 2013 Techn~ Compute~ https://old.na~ NA
#> 5 EGHT 8x8 ~ 16.2 $1.69B n/a Techn~ EDP Ser~ https://old.na~ NA
#> 6 AHC A.H.~ 1.69 $40.36M n/a Consu~ Newspap~ https://old.na~ NA
#> 7 AOS A.O ~ 49.06 $7.92B n/a Consu~ Consume~ https://old.na~ NA
#> 8 ATEN A10 ~ 8.58 $668.63M 2014 Techn~ Compute~ https://old.na~ NA
#> 9 AIR AAR ~ 19.54 $686.94M n/a Capit~ Aerospa~ https://old.na~ NA
#> 10 AAN Aaro~ 58.41 $3.92B n/a Techn~ Diversi~ https://old.na~ NA
#> # ... with 3,120 more rows
Created on 2020-08-14 by the reprex package (v0.3.0)
The dev version should fix this for now.
devtools::install_github("business-science/tidyquant")
This fix will be incorporated in the next CRAN release.
Hey, unable to get tq_exchange to work anymore, get the following error. I updated my libraries and still see this issue.
Warning messages: 1: In download.file(url, destfile = tmp, quiet = TRUE) : InternetOpenUrl failed: 'The operation timed out' 2: In value[3L] : Error at amex during call to tq_exchange.