GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.23k stars 817 forks source link

Exception occurred: The request failed: Google returned a response with code 429. #561

Closed sundios closed 1 year ago

sundios commented 1 year ago

Im getting the following error:

Exception occurred: The request failed: Google returned a response with code 429.

I think this could be because Google has a new trends website?

https://searchengineland.com/google-launches-new-google-trends-portal-394026

Is there a way to fix this issue? Thanks in advance

Jinoy-Varghese commented 1 year ago

Even I'm also facing the same issue from 08/03/2023 : 5:00PM

da2vin commented 1 year ago

Sam issue, who can tell me how to fix it?

vimal-quilt commented 1 year ago

Same Issue. when can this be fixed?

da2vin commented 1 year ago

It seems to use grecapcha , for prevent crawling data... It's hard to resolve this problem

Syndorik commented 1 year ago

So I guess no easy work around this time?

Syndorik commented 1 year ago

Ok I might have found a work around. It appears that now, the first connection to google trends returns a 429, so if you setup the object with : pytrend = TrendReq(retries=3)

It should work. I've tested it on my side and no more 429

netvids commented 1 year ago

I am still getting a 429 with retries=3. raise RetryError(e, request=request) requests.exceptions.RetryError: HTTPSConnectionPool(host='trends.google.com', port=443): Max retries exceeded with url: /trends/api/explore?hl=en-....Caused by ResponseError('too many 429 error responses'))

nicktba commented 1 year ago

Ok I might have found a work around. It appears that now, the first connection to google trends returns a 429, so if you setup the object with : pytrend = TrendReq(retries=3)

It should work. I've tested it on my side and no more 429

Im getting

HTTPSConnectionPool(host='trends.google.com', port=443): Max retries exceeded with URL: [URL] Caused by ResponseError('too many 429 error responses'))

nicktba commented 1 year ago

I've heard that implementing cURL impersonate works to overcome the captcha issue

Can someone try that and let us know?

Jinoy-Varghese commented 1 year ago

Syndorik Brooo.... You are Insane. Really thankful to you man. I had to submit a project tomorrow based on this and now everything is working properly.

nicktba commented 1 year ago

Wait, this worked for you? I'm still getting 429.

Can you show your code?

Jinoy-Varghese commented 1 year ago

pt = TrendReq(retries=3) pt.build_payload(terms) df = pt.interest_over_time()

This works

maxwnewcomer commented 1 year ago

Ok I might have found a work around. It appears that now, the first connection to google trends returns a 429, so if you setup the object with : pytrend = TrendReq(retries=3)

It should work. I've tested it on my side and no more 429

This didn't work for me.

What I did find working (unsurprisingly, yet still possibly useful for some), is replacing the headers in the request_args with a valid cookie from browser.

For those who need it working "now" (like @Jinoy-Varghese) and the retries doesn't work, try inspecting element on the trends page, go to network, click on a ?geo request and copy the cookie into your construction of TrendReq.

Should look like:

p = TrendReq(request_args = {'headers': {'Cookie': NID COOKIE HERE}})

Currently looking for a more consistent solution that isn't just spamming the google servers with more retries.

nicktba commented 1 year ago

Ok I might have found a work around. It appears that now, the first connection to google trends returns a 429, so if you setup the object with : pytrend = TrendReq(retries=3) It should work. I've tested it on my side and no more 429

This didn't work for me.

What I did find working (unsurprisingly, yet still possibly useful for some), is replacing the headers in the request_args with a valid cookie from browser.

For those who need it working "now" like @Jinoy-Varghese and the retries dosn't work, try inspecting element on the trends page, go to network, click on a ?geo request and copy the cookie into your construction of TrendReq.

Should look like:

p = TrendReq(request_args = {'headers': {'Cookie': NID COOKIE HERE}})

Isn't there a risk of running into issues doing this?

borrowing cookies is not my preferred method

maxwnewcomer commented 1 year ago

@nicktba Yeah definitely not a long term solution. Just was saying in case someone needed a solution rn for a school project like Jinoy. Especially when there isn't a fix implemented yet.

nicktba commented 1 year ago

@maxwnewcomer Can you try implementing cURL impersonate into your payload?

The guys over @ SERPAPI have been using it to resolve their issues

maxwnewcomer commented 1 year ago

@maxwnewcomer Can you try implementing cURL impersonate into your payload?

The guys over @ SERPAPI have been using it to resolve their issues

Will do, how did you hear about the SERPAPI process? Cool intel.

maxwnewcomer commented 1 year ago

Thanks to @nicktba I have the curl impersonate working (no retries and no cookies needed). Some of the curl_cffi session methods are different than the normal requests module, so will do some updating to the _get_data() method and hopefully push a fix soon.

New functionality will be ability to impersonate:

chrome99
chrome100
chrome101
chrome104
chrome107
chrome110
chrome99_android
edge99
edge101
safari15_3
safari15_5

Side effect of this push will be including a new required package, curl_cffi.

nicktba commented 1 year ago

Thanks to @nicktba I have the curl impersonate working (no retries and no cookies needed). Some of the curl_cffi session methods are different than the normal requests module, so will do some updating to the _get_data() method and hopefully push a fix soon.

New functionality will be ability to impersonate:

chrome99
chrome100
chrome101
chrome104
chrome107
chrome110
chrome99_android
edge99
edge101
safari15_3
safari15_5

Side effect of this push will be including a new required package, curl_cffi.

Amazing! Thanks Max!

Im going to send you an email, lets chat!

maxwnewcomer commented 1 year ago

I'm a little confused by the changes google made. It seems like they wanted to make it harder to scrape their trends data, but when you look at the response from browser and from the impersonate enabled version of pytrends, the user type on browser is USER_TYPE_LEGIT_USER and the response from the working pytrends is USER_TYPE_SCRAPER. This indicates that they know it's scraping, but don't care? Yet, it still breaks the pytrend scraping. Kind of odd.

nicktba commented 1 year ago

I'm a little confused by the changes google made. It seems like they wanted to make it harder to scrape their trends data, but when you look at the response from browser and from the impersonate enabled version of pytrends, the user type on browser is USER_TYPE_LEGIT_USER and the response from the working pytrends is USER_TYPE_SCRAPER. This indicates that they know it's scraping, but don't care? Yet, it still breaks the pytrend scraping. Kind of odd.

I think, for now at least, they are just trying to categorize those who are scrapers and those who are not.

It's likely in the near future they will use this to forecast and integrate an API credit system or block scraping overall.

The user-type update was launched earlier this year and disrupted a wide range of unofficial APIs. PyTrends included.

maxwnewcomer commented 1 year ago

@nicktba Funny... I wonder how many paid/open-sourse SEO and SERP tools they broke and will break in the upcoming years.

maxwnewcomer commented 1 year ago

Also update on the fix... I could push a fix without retries working rn, but would like to get that figured out first (along with tests). Worst case if I don't hear from the curl_cffi community in a bit I will just add an "impersonate" flag people can add to the TrendReq constructor that will flip functionality from the normal request module to cURL Impersonate.

nicktba commented 1 year ago

Also update on the fix... I could push a fix without retries working rn, but would like to get that figured out first (along with tests). Worst case if I don't hear from the curl_cffi community in a bit I will just add an "impersonate" flag people can add to the TrendReq constructor that will flip functionality from the normal request module to cURL Impersonate.

Awesome! Ill patiently wait for that update.

Thanks for your effort

maxwnewcomer commented 1 year ago

Just opened that PR, should work for basic usage. No testing, retries, or confirmed proxy usage with that code. still wip.

jesvinc commented 1 year ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

maxwnewcomer commented 1 year ago

Hahahah sick @jesvinc !! Funny how it can be that simple. I do however think the cURL impersonate functionality might be nice to have in the future. I can add that change to my PR or create your own, up to you!

jesvinc commented 1 year ago

Haha yeah, I was shocked to find that out. Since it's so simple, I'm fine with you adding that change to your PR. I'll just add a code comment to it.

AbhishekThalanki commented 1 year ago

Tried @MartinNowak's patch but the issue still persists. I've got 50 keywords out of which it returns data for 1 and then throws the error, requests.exceptions.RetryError: HTTPSConnectionPool(host='trends.google.com', port=443): Max retries exceeded with url: /trends/api/explore?hl=en-GB&tz=360&req=%7B%22comparisonItem%22%3A+%5B%7B%22keyword%22%3A+%22accessorize%22%2C+% 22time%22%3A+%222018-07-01+2023-03-12%22%2C+%22geo%22%3A+%22US%22%7D%5D%2C+%22category%22%3A+0%2C+%22property%22%3A+%22%22%7D (Caused by ResponseError('too many 429 error responses'))

Has anyone else been able to resolve this successfully?

arthii17 commented 1 year ago

I am also facing the issue since last 4 days

maxwnewcomer commented 1 year ago

@AbhishekThalanki and @arthii17, feel free to pull my fork on the pull request #563. The impersonate feature has seemed to work for me.

danielfree commented 1 year ago

face the same issue, adding all local cookies to TrendReq(requests_args=..) seems to fix it temporarily

gilbertovilarunc commented 1 year ago

Same issue here.

gdavoian commented 1 year ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

I would like to confirm that the proposed solution has worked for me perfectly. Thank you @maxwnewcomer!

Please see my comment if you need a temporary workaround to make your code work until the fix has been added to the library.

karam-khanna commented 1 year ago

Have tried above solutions but seem to still get 429s consistently

totencrab commented 1 year ago

The solution posted by @ckosmic here works consistently for me.

In the pytrends request.py file, at line 76 and 89, insert explore before /?geo.

so f'{BASE_TRENDS_URL}?geo={self.hl[-2:]}', becomes so f'{BASE_TRENDS_URL}explore/?geo={self.hl[-2:]}',

vikas-sp-97 commented 1 year ago

@totencrab Thanks for this bit of information, it was really helpful! 👍

aalyousfi commented 1 year ago

Any estimate on when the package will be updated with a fix? Thanks.

danielfree commented 1 year ago

for ppl who want a fix now, you can use a selenium webdriver to visit the webpage once and extract cookie, then add it into TrendReq()

def get_cookie():
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    driver = webdriver.Chrome(options=options)
    driver.get("https://trends.google.com/")
    time.sleep(5)
    cookie = driver.get_cookie("NID")["value"]
    driver.quit()
    return cookie

nid_cookie = f"NID={get_cookie()}"
pytrends = TrendReq(
    ...
    requests_args={"headers": {"Cookie": nid_cookie}}
)
srgh1367 commented 1 year ago

@totencrab Thanks.

Michae94 commented 1 year ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

I just wanted to thank you for this as it completely solve the issue I had.

tomfbush commented 1 year ago

@danielfree this worked straight away for me, thanks! 👍

Preceded with

from selenium import webdriver
import time

to save people about 2 seconds 😄

emlazzarin commented 1 year ago

I believe we resolved this with #570 which is now included in the v4.9.1 release. Thank you all for helping uncover the underlying issues!

qcgm1978 commented 1 year ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

I would like to confirm that the proposed solution has worked for me perfectly. Thank you @maxwnewcomer!

Please see my comment if you need a temporary workaround to make your code work until the fix has been added to the library.

I encountered this problem again. The version of pytrends is 4.9.2, but changing 'get' to 'post' in GetGoogleCookie function works.

alaaddinsarac commented 1 year ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

thanks for the solution. I was going to lose my mind over this

camirojasguajardo commented 10 months ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

Hi! does anyone know if this solution still works? im using pytrends 4.9.2

AhmadKhanSenpai commented 8 months ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

Hi! does anyone know if this solution still works? im using pytrends 4.9.2

no it does not work, I was using it in my project and it was working fine but the next day its gives me the Too many Request error code 429

Dojohn2004 commented 8 months ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

Hi! does anyone know if this solution still works? im using pytrends 4.9.2

no it does not work, I was using it in my project and it was working fine but the next day its gives me the Too many Request error code 429

same here, it is not working for me.

Dojohn2004 commented 7 months ago

I believe I found a much simpler solution to @maxwnewcomer's. The request made in GetGoogleCookie is a GET, but it responds with a empty 200 response. But if you instead make do a POST, the API correctly responds with a cookie. The fix is simply to change this line to be requests.post.

Hi! does anyone know if this solution still works? im using pytrends 4.9.2

no it does not work, I was using it in my project and it was working fine but the next day its gives me the Too many Request error code 429

same here, it is not working for me.

I found it works sometimes, it works for few weeks ago, but now is not working again

sumitsihag123 commented 6 months ago

Hii, Google trend start blocking again. It is observed that this month we are not able to scrape data properly and it start working fine for 3-4 hours but again showing 429 error or malformed error.