dpguthrie / yahooquery

Python wrapper for an unofficial Yahoo Finance API
https://yahooquery.dpguthrie.com
MIT License
765 stars 135 forks source link

Yahoo Finance Premium instituting recaptcha #254

Open me1029134 opened 8 months ago

me1029134 commented 8 months ago

Describe the bug I believe there is some kind of recaptcha problem. It's not on all the request though maybe like half of them. Below is my error.

DevTools listening on ws://127.0.0.1:63373/devtools/browser/661f6e71-8cf3-4067-bbfe-3966923a90ab [1230/140638.380:ERROR:gl_utils.cc(412)] [.WebGL-00001D8400E82200]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels [1230/140640.413:ERROR:gl_utils.cc(412)] [.WebGL-00001D84002B3F00]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels Unable to login and/or retrieve the appropriate cookies. This is most likely due to Yahoo Finance instituting recaptcha, which this package does not support.

To Reproduce Steps to reproduce the behavior:

  1. When pulling: query = yq.Ticker('ASGTF', username= "UserEmail", password="PW")

I get: {'ASGTF': 'User is not logged in'}

Seems to do it about half the time and different tickers or pulling the same ticker multiple times.

Expected behavior I'm expecting to get p_all_financial_data. I see it when I'm login in. I verified I get it when I am logged in.

Screenshots

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Comment from dpguthrie describing the problem and solution probably a little more. https://github.com/dpguthrie/yahooquery/issues/251#issuecomment-1869823881 I thought I had it fixed but it was not.

Tharindu-Abay commented 8 months ago

Is this what's happening in you code: When you are logging into yahoo account from selenium you get a recaptcha and you cannot continue?

thelaycon commented 8 months ago

Attach a screenshot.

samirgorai commented 8 months ago

can you add some visual files when you are getting error and when you are getting a normal expected result

me1029134 commented 8 months ago

Sure, for example: If I run this example code:

import yahooquery as yq
password = 'PW'
userEmail = 'UserEmail'
symbol='AAPL'
while (True):
  query = yq.Ticker(symbol, username= userEmail, password=password)
  p_all_financial_data_quarter = query.p_all_financial_data(frequency='q')
  print(p_all_financial_data_quarter)

I get this the first run: image Working correctly,

and this the second run: image Not working. It seems like it will work about half the time, randomly (no sequence or anything)

Then of course on chrome when I'm logged in I see the correct data too: image

I'm tried it on Windows 10 and 11, and Python 3.9 and 3.12

Thanks in advance for your help!

samirgorai commented 8 months ago

some Questions: 1)Where there some recent changes because of this error is produced or does the previous versions of the library also shows this error 2)For your example import yahooquery as yq password = 'PW' userEmail = 'UserEmail' symbol='AAPL' while (True): query = yq.Ticker(symbol, username= userEmail, password=password) #trying to login with user credentials p_all_financial_data_quarter = query.p_all_financial_data(frequency='q') print(p_all_financial_data_quarter)

the login is done in base.py

Yfinnce Base

i think whenver a user logins it this part of the code must be executed can you confirm if i am correct/wrong

3)how can i debug at my local where can i get my Username password

samirgorai commented 8 months ago

Is it possible to get your email id so that i can mesage you directly.

samirgorai commented 8 months ago

Possible FIX can yo look at #255

samirgorai commented 8 months ago

Hello @dpguthrie @me1029134 i TESTED THE CODE with my changes #255

import yahooquery as yq password = 'XXXXXX' userEmail = 'XXXXXX@yahoo.com' symbol='AAPL' while (True): query = yq.Ticker(symbol, username= userEmail, password=password) p_all_financial_data_quarter = query.p_all_financial_data(frequency='q') print(p_all_financial_data_quarter)

AND THE RESULT WAS

DevTools listening on ws://127.0.0.1:64734/devtools/browser/41a2456b-89ec-4df2-b6a5-d65774e7c308 [0102/091740.817:ERROR:command_buffer_proxy_impl.cc(127)] ContextResult::kTransientFailure: Failed to send GpuControl.CreateCommandBuffer. [0102/091745.755:ERROR:gl_utils.cc(412)] [.WebGL-0000438400E7D400]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels [0102/091748.973:ERROR:gl_utils.cc(412)] [.WebGL-00004384002C0000]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels {'AAPL': 'User is not subscribed to Premium or has invalid cookies'}

CAN YOU CHECK ONCE AT YOUR SETUP WITH YOUR id

me1029134 commented 8 months ago

Unfortunately after adding this line: self.driver.find_element(By.XPATH, "//input[@id='login-username']").send_keys(self.username) (and commenting out the other)

I'm still getting the same problem: image

samirgorai commented 8 months ago

@me1029134 how can i get the build please after my changes.

samirgorai commented 8 months ago

I am able to login into login.yahoo.com

using the following script

""" file to test login """ from selenium.webdriver.support.ui import WebDriverWait from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import time from bs4 import BeautifulSoup from selenium.webdriver.support import expected_conditions as EC

while(1): username="XXXXXXX@yahoo.com" pasword="XXXXXX" driver_path='C:\Users\samir\Web Scraping14-12-2023\geckodriver.exe' LOGIN_URL = "https://login.yahoo.com" browser = webdriver.Firefox() browser.get(LOGIN_URL) print(browser.title) browser.find_element(By.XPATH, "//input[@id='login-username']").send_keys(username) browser.find_element(By.XPATH, "//input[@id='login-signin']").click() password_element = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "login-passwd"))) password_element.send_keys(pasword) browser.find_element(By.XPATH, "//button[@id='login-signin']").click()

time.sleep(5)

I think the problem is with image in base.py code if(instance.cookies:) the condition is resulting false

I can also see that it was modified in last commit.

samirgorai commented 8 months ago

@me1029134 @dpguthrie can you PLease check once i have made some changes and commited

Thank you

me1029134 commented 8 months ago

I believe the method we are trying to do is pull the cookies from a chrome log in session and load them into the Selenium session. Something along the lines of these articles: https://stackoverflow.com/questions/15058462/how-to-save-and-load-cookies-using-python-selenium-webdriver https://medium.com/@ghulammustafapy/efficient-login-session-management-in-selenium-python-save-and-reuse-credentials-for-browser-7aa21b32df63

me1029134 commented 8 months ago

I have a prototype fix that seems to work for me. I noticed if I put a 20 second wait after the login and before any of the pulls, it seems to not get hung up for some reason. I added that and I added just saving the entire session after a good login. It would be better if you could just pass in the cookies / session, that seems like the correct way to do it. Here is the fix that worked for me at least:

    def login(self) -> None:
        if _has_selenium:
            session_instance='session_save_location/session_instance.pkl'
            if os.path.exists(session_instance):
                with open(session_instance, 'rb') as file:
                    self.session.cookies = pickle.load(file)
            else:
                instance = YahooFinanceHeadless(self.username, self.password)
                instance.login()
                time.sleep(20)
                if instance.cookies:
                    self.session.cookies = instance.cookies
                    with open(session_instance, 'wb') as file:
                        pickle.dump(self.session.cookies, file)
                    return
                else:
                    logger.warning(
                        "Unable to login and/or retrieve the appropriate cookies.  This is "
                        "most likely due to Yahoo Finance instituting recaptcha, which "
                        "this package does not support."
                    )
        else:
            logger.warning(
                "You do not have the required libraries to use this feature.  Install "
                "with the following: `pip install yahooquery[premium]`"
            )
        self.session = setup_session(self.session, self._setup_url)
samirgorai commented 8 months ago

@dpguthrie Do you have a high level design any document/image to understand your library?

dpguthrie commented 8 months ago

@samirgorai Nope, sorry.