GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.23k stars 817 forks source link

timeframe doesn't work - Google returned a response with code 429 #596

Open dbitton opened 1 year ago

dbitton commented 1 year ago

The timeframe parameter (any hourly and daily flags tested) keeps breaking the code, giving Google 429 even in the first call (not over calling) in code that has been running for long without any issues. (most updated version of pytrends)

This works: kw_list=[''] pytrend.build_payload(kw_list) related_queries = pytrend.related_queries()

This doesn't: pytrend.build_payload(kw_list, timefreme = "now 1-H") related_queries = pytrend.related_queries()

Helldez commented 1 year ago

Same here from one month

haghft commented 1 year ago

At first it gave an error of 500, then it became 429

2uanDM commented 1 year ago

I have changed to use selenium to scrape instead for over a month :)))

mouize commented 1 year ago

I have changed to use selenium to scrape instead for over a month :)))

I'm trying the same with puppeteer, but not stable, time to time, I still have the 429 in headless mode. Is it fully working for u? Any tips ?

2uanDM commented 1 year ago

I will try to change to the headless mode to see whether it works.

First I get to https://trends.google.com/trends/, and then get to the url of https://trends.google.com.vn/trends/explore to get the cookies, then I perform a fake searching (shirt) for example. Then can get to another permalink with geo and time frame. I scrape using clicking on the download csv and parse that csv file to get the thing I want.

I found that GG can track whether an IP address is scraping or not, so you can random interval time between performing automation step, and I highly recommend using proxies. An IP can scrape data again without getting 429 if it "relax" for more than one or two hourse after performing downloading a bunch of keywords

Helldez commented 1 year ago

Does it work if you use pytrends with a list of proxies?

2uanDM commented 1 year ago

Does it work if you use pytrends with a list of proxies?

I haven't tried yet but I think It will not solve the problem, since you still can scrape the data with pytrends if your time frame is more than a year, so I think the problem it that your requests is regconized as a bot, not a client

2uanDM commented 1 year ago

I see that pytrends works good again

francksa commented 1 year ago

are you sure?

2uanDM commented 1 year ago

You can try, my automatic program can scrawl with time frame 1 - H without 429 errors for more than 3 hours. If it's true so the problem is Google API backend :v

jeffsnack commented 11 months ago

So, has this issue been resolved? I tried the aforementioned method of adding the custom header into "dailydata.py", but I still encounter the 429 Error...

It even triggered the error without executing data for six months.

Here is my code and exception.

from pytrends.request import TrendReq
import json
import concurrent.futures
from pytrends import dailydata
import pandas as pd
import time

#pytrend = TrendReq(hl='en-US',tz=360)

df = dailydata.get_daily_data('Rice', 2004, 1, 2022, 9, geo = 'US')
df.to_excel('Rice_6000.xlsx')
print('Complete')

Rice:2004-01-01 2004-01-31
---------------------------------------------------------------------------
TooManyRequestsError                      Traceback (most recent call last)
<ipython-input-1-44c7758dfcd6> in <module>
     11 
     12 
---> 13 df = dailydata.get_daily_data('Rice', 2004, 1, 2022, 9, geo = 'US')
     14 df.to_excel('Rice_6000.xlsx')
     15 print('Complete')

~\Anaconda3\lib\site-packages\pytrends\dailydata.py in get_daily_data(word, start_year, start_mon, stop_year, stop_mon, geo, verbose, wait_time)
    140         if verbose:
    141             print(f'{word}:{timeframe}')
--> 142         results[current] = _fetch_data(pytrends, build_payload, timeframe)
    143         current = last_date_of_month + timedelta(days=1)
    144         sleep(wait_time)  # don't go too fast or Google will send 429s

~\Anaconda3\lib\site-packages\pytrends\dailydata.py in _fetch_data(pytrends, build_payload, timeframe)
     70         else:
     71             fetched = True
---> 72     return pytrends.interest_over_time()
     73 
     74 

~\Anaconda3\lib\site-packages\pytrends\request.py in interest_over_time(self)
    233             method=TrendReq.GET_METHOD,
    234             trim_chars=5,
--> 235             params=over_time_payload,
    236         )
    237 

~\Anaconda3\lib\site-packages\pytrends\dailydata.py in _get_data(self, url, method, trim_chars, **kwargs)
     35 class CustomTrendReq(TrendReq):
     36     def _get_data(self, url, method=TrendReq.GET_METHOD, trim_chars=0, **kwargs):
---> 37         return super()._get_data(url, method=TrendReq.GET_METHOD, trim_chars=trim_chars, headers=headers, **kwargs)
     38 
     39 def get_last_date_of_month(year: int, month: int) -> date:

~\Anaconda3\lib\site-packages\pytrends\request.py in _get_data(self, url, method, trim_chars, **kwargs)
    156         else:
    157             if response.status_code == status_codes.codes.too_many_requests:
--> 158                 raise exceptions.TooManyRequestsError.from_response(response)
    159             raise exceptions.ResponseError.from_response(response)
    160 

TooManyRequestsError: The request failed: Google returned a response with code 429
Raidus commented 11 months ago

Just be patient. This issue we had 1 month ago and it disappeared after a few days. If iframe embedding is not working then crawling is likely not working too image

Karlheinzniebuhr commented 10 months ago

Same here, the example from the documentation fails with:

TooManyRequestsError                      Traceback (most recent call last)
[~\AppData\Local\Temp/ipykernel_23216/3968946112.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/dev/Python/Forecasting_PY/~/AppData/Local/Temp/ipykernel_23216/3968946112.py) in <module>
      8 
      9 # Interest Over Time
---> 10 interest_over_time_df = pytrend.interest_over_time()
     11 print(interest_over_time_df.head())
     12 

[c:\ProgramData\anaconda3\envs\ML\lib\site-packages\pytrends\request.py](file:///C:/ProgramData/anaconda3/envs/ML/lib/site-packages/pytrends/request.py) in interest_over_time(self)
    230 
    231         # make the request and parse the returned json
--> 232         req_json = self._get_data(
    233             url=TrendReq.INTEREST_OVER_TIME_URL,
    234             method=TrendReq.GET_METHOD,

[c:\ProgramData\anaconda3\envs\ML\lib\site-packages\pytrends\request.py](file:///C:/ProgramData/anaconda3/envs/ML/lib/site-packages/pytrends/request.py) in _get_data(self, url, method, trim_chars, **kwargs)
    157         else:
    158             if response.status_code == status_codes.codes.too_many_requests:
--> 159                 raise exceptions.TooManyRequestsError.from_response(response)
    160             raise exceptions.ResponseError.from_response(response)
    161 

TooManyRequestsError: The request failed: Google returned a response with code 429