Closed bjosun closed 11 months ago
I think I am hitting the same problem. I run
>>> from yahoofinancials import YahooFinancials
>>> data = YahooFinancials('VOD.L').get_stock_price_data()
and get the error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mick/.local/lib/python3.10/site-packages/yahoofinancials/yf.py", line 99, in get_stock_price_data
return self.get_clean_data(self.get_stock_tech_data('price'), 'price')
File "/home/mick/.local/lib/python3.10/site-packages/yahoofinancials/etl.py", line 591, in get_stock_tech_data
return self.get_stock_data(tech_type=tech_type)
File "/home/mick/.local/lib/python3.10/site-packages/yahoofinancials/etl.py", line 561, in get_stock_data
dict_ent = self._create_dict_ent(self.ticker, statement_type, tech_type, report_name, hist_obj)
File "/home/mick/.local/lib/python3.10/site-packages/yahoofinancials/etl.py", line 517, in _create_dict_ent
re_data = self._get_historical_data(YAHOO_URL, r_map, tech_type, statement_type)
File "/home/mick/.local/lib/python3.10/site-packages/yahoofinancials/etl.py", line 244, in _get_historical_data
self._request_handler(url, config.get("response_field"))
File "/home/mick/.local/lib/python3.10/site-packages/yahoofinancials/etl.py", line 205, in _request_handler
raise ManagedException("Server replied with server error code, HTTP " + str(response.status_code) +
yahoofinancials.etl.ManagedException: Server replied with server error code, HTTP 404 code while opening the url: https://query1.finance.yahoo.com/v6/finance/quoteSummary/vod.l?modules=price&formatted=False&lang=en-US®ion=US&corsDomain=finance.yahoo.com
@micksulley @bjosun Thank you for raising this issue. I’m completely slammed at work right now, however I can dedicate time this weekend towards solving this.
Just FYI, other Python libraries are having the same issue:
https://github.com/dpguthrie/yahooquery/issues/224 https://github.com/ranaroussi/yfinance/issues/1729
get_financial_stmts functions as expected. The other functions throw a 404 exception as others have mentioned. I noticed that the python requests package is also having issues with finance.yahoo.com. A get() request to https://finance.yahoo.com/quote/AAPL (or any other ticker) will work fine; able to scrape quote data. But a get() request to https://finance.yahoo.com/quote/AAPL/financials, or https://finance.yahoo.com/quote/AAPL/cash-flow, etc. throws the 404 error. Even though I can copy/paste the url string from the request object in the python debug window to a browser and it opens fine.
So here is what I learned from trying to figure this out over the weekend:
https://query1.finance.yahoo.com/v1/test/getcrumb
B. Append that crumb to the v10 (instead of v6) endpoint as so: https://query2.finance.yahoo.com/v10/finance/quoteSummary/c?modules=summaryDetail&formatted=False&lang=en-US®ion=US&corsDomain=finance.yahoo.com&crumb=<crumb>
Where I got stuck was attempting to get a crumb programmatically. Using the etl.py UrlOpener
class I'm able to get a 200 response, but it's not returning the crumb in the content. Which is similar to opening the crumb URL via a private browser window. 200 response, but no crumb. It maybe an issue of missing request headers and/or needing to get a cookie first. There's definitely a way to crack this if we try hard enough.
So for now:
I'm pretty slammed at my job right now, so I'd definitely appreciate some assistance on item 2. I probably won't be able to pick this back up until end of the week due to my work load. I'll be regularly checking this thread and can merge in a PR / implement a posted fix if someone figures this out before then.
To make this easier, here is a small script I've been using to isolate getting the crumb. If anyone can get this returning the crumb, then I can take it from there and have a fix out quickly. My suspicion is that we'll need to somehow use a cookie.
import random
import time
import requests as requests
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 "
"Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 '
'Safari/537.36'
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 "
"Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 "
"Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
]
class UrlOpener:
request_headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"accept-encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"accept-language": "en-US,en;q=0.5",
"origin": "https://finance.yahoo.com",
"referer": "https://finance.yahoo.com",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"host": "query1.finance.yahoo.com",
"Upgrade-Insecure-Requests": "1",
"te": "trailers",
}
user_agent = random.choice(USER_AGENTS)
request_headers["User-Agent"] = user_agent
def __init__(self, session=None):
self._session = session or requests
def open(self, url, request_headers=None, params=None, proxy=None, timeout=30):
response = self._session.get(
url=url,
params=params,
proxies=proxy,
timeout=timeout,
headers=request_headers or self.request_headers
)
return response
def get_crumb():
urlopener = UrlOpener()
url = "https://query1.finance.yahoo.com/v1/test/getcrumb"
max_retry = 10
for i in range(0, max_retry):
response = urlopener.open(url)
if response.status_code != 200:
time.sleep(random.randrange(1, 5))
response.close()
time.sleep(random.randrange(1, 5))
else:
print("response: ", response.text)
res_content = response.text
response.close()
return res_content
return None
if __name__ == '__main__':
print(get_crumb())
Ok, I figured out a way to do this and released a fixed in v1.17. I tested it using IPs from several different countries and it seems to be working fine now. Closing this issue.
Still not working here :( My IP is from Poland, but it doesn't work for me even when using US-based proxies. The getcrumb URL returns 0 bytes, when I open it in a "real" browser, it does return a crumb string.
I'm happy to do even extensive testing. Please, let me know if there's something I should try.
Updating the package version fixed the issue. Thank you very much @JECSand !
I get this error: lib/python3.8/site-packages/yahoofinancials/etl.py", line 205, in _request_handler raise ManagedException("Server replied with server error code, HTTP " + str(response.status_code) + yahoofinancials.etl.ManagedException: Server replied with server error code, HTTP 404 code while opening the url: https://query1.finance.yahoo.com/v6/finance/quoteSummary/husq-b.st?modules=defaultKeyStatistics&formatted=False&lang=en-US®ion=US&corsDomain=finance.yahoo.com
Seems to be an issue with this: elif tech_type != '' and statement_type != 'history': r_map = get_request_config(tech_type, REQUEST_MAP) try: re_data = self._get_historical_data(YAHOO_URL, r_map, tech_type, statement_type) except KeyError: re_data = None dict_ent = {up_ticker: re_data}