Closed jgriessler closed 11 months ago
When I enter from Spain Yahoo forces me to accept cookies and that is the problem. I think it is necessary to press the "ok" button with selenium. The problem I have later is that it gives me many connections when i go https://query2.finance.yahoo.com/v1/test/getcrumb
@jgriessler This is absolutely fantastic, thank you. Just tested your solution locally and I can now access the problematic API:s from Sweden, and I suspect the rest of EU (GDPR) regulated countries. Are you familiar with forking and making PR:s on github? I think it would be a nicer way for others to review and test the solution instead to manually pasting the code? In any case this is great news, if it is regulations and not a yahoo specific issue. Then it will probably continue to work and not be that much of "whac a moleto" to keep it running. Thanks.
Its without VPN working fine for me in the Netherlands. Thanks @jgriessler for the patch. The instruction is a bit troublesome, so a pointwise instruction:
1)
Open the file ...../lib/python3.10/site-packages/yahooquery/utils/init.py
2)
Add in the header of the file after # third party
from bs4 import BeautifulSoup
3)
Replace the method
def setup_session(session: requests.Session):
by
def setup_session(session: requests.Session):
url = "https://finance.yahoo.com"
try:
response = session.get(url, allow_redirects=True)
except SSLError:
counter = 0
while counter < 5:
try:
session.headers = random.choice(HEADERS)
response = session.get(url, verify=False)
break
except SSLError:
counter += 1
if not isinstance(session, FuturesSession):
# check for and handle consent page:w
if response.url.find('consent'):
logger.debug(f'Redirected to consent page: "{response.url}"')
soup = BeautifulSoup(response.content, 'html.parser')
params = {}
for param in ['csrfToken', 'sessionId']:
try:
params[param] = soup.find('input', attrs={'name': param})['value']
except Exception as exc:
logger.critical(f'Failed to find or extract "{param}" from response. Exception={exc}')
return
logger.debug(f'params: {params}')
response = session.post(
'https://consent.yahoo.com/v2/collectConsent',
data={
'agree': ['agree', 'agree'],
'consentUUID': 'default',
'sessionId': params['sessionId'],
'csrfToken': params['csrfToken'],
'originalDoneUrl': url,
'namespace': 'yahoo'
})
# just assume things are fine and session is setup now
return session
_ = response.result()
return session
Gracias Griessler, Rudy!!!
@jgriessler Really appreciate the solution here! I'll work on putting this in and get it in the next release.
https://consent.yahoo.com/v2/collectConsent is dead, now.
@ibart This is most likely due to the fact that your browser is making a GET
request - the url that you're using, and the one used internally, accepts the POST
method with a defined body
.
Thanks everyone for moving this forward (and of course Doug for getting the functionality in) while I was distracted with personal stuff. I've not yet played with github, so would only mess up trying to fork and work a PR.
One other comment - I noticed that things are a little bit slower now when querying data - I assume it's because finance.yahoo.com is just huge, so loading the main site takes time. Going through the consent for every query is also quite some overhead if you run a series of history update queries. So I switched to "reusing" the yq.Ticker() instance , just modifying the ticker.symbols. I do get a fresh instance randomly still to start fresh every 30-50 queries.
Is your feature request related to a problem? Please describe. CRUMB failures occur when running yahooquery from Europe. Testing shows this is because for queries from Europe Yahoo redirects finance.yahoo.com to a Page to Consent to usage of data: https://consent.yahoo.com/v2/collectConsent?sessionId=3_cc-session_6b0b0161-b473-4d30-bc6f-5cdd007600aa
WIthout ack that page the subsequent call to get the crumb via https://query2.finance.yahoo.com/v1/test/getcrumb fails
Describe the solution you'd like Implement a check to see if yahoo redirects to the CONSENT page. If yes, send an 'Agree' to that page to get the necessary cookies etc.
Sample code that works (but likely needs some tweaking `def setup_session(session: requests.Session): url = "https://finance.yahoo.com" try: response = session.get(url, allow_redirects=True) except SSLError: counter = 0 while counter < 5: try: session.headers = random.choice(HEADERS) response = session.get(url, verify=False) break except SSLError: counter += 1
`
Describe alternatives you've considered I'm not aware of any other solution to work around this for queries from Europe.
Additional context