GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.25k stars 815 forks source link

interest_over_time doesn't work #111

Closed FourthWiz closed 7 years ago

FourthWiz commented 7 years ago

Hi, I have the following issue:

Using your example I execute the following code: pytrend.build_payload(kw_list=['pizza', 'bagel']) pytrend.interest_over_time()

After the last one I have an answer "ValueError: year is out of range"

And the following: pytrend.interest_by_region() gives me : ValueError: No JSON object could be decoded

At the same time pytrend.related_queries() works well.

What could be wrong here?

kritideep commented 7 years ago

Hey@ same problem arrive here bcz This code main problem around here . pytrend.interest_by_region() gives me : ValueError: No JSON object could be decoded.... beacause Region is the main theme object and data comes out in the us region bydefault.

dreyco676 commented 7 years ago

@kritideep I think I found the issue. When you pass something to geo to interest_by_region() Google expects the request to be formatted differently, in particular I think the resolution needs to be 'SUBREGION'. That said I probably won't get to this for a couple days.

@FourthWiz are you on the latest version & using a valid gmail account? I'm not able to replicate on my end, so I'll need more information.

On Wed, Feb 15, 2017 at 6:38 AM, kritideep notifications@github.com wrote:

Hey@ same problem arrive here bcz This code main problem around here . pytrend.interest_by_region() gives me : ValueError: No JSON object could be decoded.... beacause Region is the main theme object and data comes out in the us region bydefault.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GeneralMills/pytrends/issues/111#issuecomment-280000098, or mute the thread https://github.com/notifications/unsubscribe-auth/AGxCB9Uw7eX6SuKZEz2O48_uam3RgCokks5rcvGygaJpZM4MBpt_ .

dreyco676 commented 7 years ago

I think I've fixed this now. pytrend.build_payload(kw_list=['pizza', 'bagel'], geo='IN') will no longer throw an error within pytrend.interest_by_region()

Try updating and see if it solves your issue.

kritideep commented 7 years ago

hey@ You had corrected the code but same problem arrive here till now when I am using the interest_by_region_df = pytrend.interest_by_region() and error comes here ...ValueError: No JSON object could be decoded

dreyco676 commented 7 years ago

@kritideep please provide your code as I'm not able to replicate. Remember to remove your email & password before posting.

kritideep commented 7 years ago

Ok thanks a lot

kritideep commented 7 years ago

hey@ here is code.. from future import absolute_import, print_function, unicode_literals import sys import requests import json import pandas as pd from bs4 import BeautifulSoup if sys.version_info[0] == 2: # Python 2 from urllib import quote else: # Python 3 from urllib.parse import quote

class TrendReq(object): """ Google Trends API """ def init(self, google_username, google_password, hl='en-US', tz=360, geo='IN' ,custom_useragent=None): """ Initialize hard-coded URLs, HTTP headers, and login parameters needed to connect to Google Trends, then connect. """ self.username = google_username self.password = google_password

google rate limit

    self.google_rl = 'You have reached your quota limit. Please try again later.'
    self.url_login = "https://accounts.google.com/ServiceLogin"
    self.url_auth = "https://accounts.google.com/ServiceLoginAuth"
    # custom user agent so users know what "new account signin for Google" is
    if custom_useragent is None:
        self.custom_useragent = {'User-Agent': 'PyTrends'}
    else:
        self.custom_useragent = {'User-Agent': custom_useragent}
    self._connect()
    self.results = None

    # set user defined options used globally
    self.tz = tz
    self.hl = hl
    self.geo = 'IN'
    self.kw_list = list()

    # intialize widget payloads
    self.interest_overtime_widget = dict()
    self.interest_by_region_widget = dict()
    self.related_queries_widget_list = list()

def _connect(self):
    """
    Connect to Google.
    Go to login page GALX hidden input value and send it back to google + login and password.
    http://stackoverflow.com/questions/6754709/logging-in-to-google-using-python
    """
    self.ses = requests.session()
    login_html = self.ses.get(self.url_login, headers=self.custom_useragent)
    soup_login = BeautifulSoup(login_html.content, "lxml").find('form').find_all('input')
    form_data = dict()
    for u in soup_login:
        if u.has_attr('value') and u.has_attr('name'):
            form_data[u['name']] = u['value']
    # override the inputs with out login and pwd:
    form_data['Email'] = self.username
    form_data['Passwd'] = self.password
    self.ses.post(self.url_auth, data=form_data)

def build_payload(self, kw_list, cat=0, timeframe='today 5-y', geo='IN', gprop=''):
    """Create the payload for related queries, interest over time and interest by region"""
    token_payload = dict()
    self.kw_list = kw_list
    self.geo = geo
    token_payload['hl'] = self.hl
    token_payload['tz'] = self.tz
    token_payload['req'] = {'comparisonItem': [], 'category': cat}
    token_payload['property'] = gprop
    # build out json for each keyword
    for kw in self.kw_list:
        keyword_payload = {'keyword': kw, 'time': timeframe, 'geo': self.geo}
        token_payload['req']['comparisonItem'].append(keyword_payload)
    # requests will mangle this if it is not a string
    token_payload['req'] = json.dumps(token_payload['req'])
    # get tokens
    self._tokens(token_payload)
    return

def _tokens(self, token_payload):
    """Makes request to Google to get API tokens for interest over time, interest by region and related queries"""

    # make the request
    req_url = "https://www.google.com/trends/api/explore"
    req = self.ses.get(req_url, params=token_payload)

    # parse the returned json
    # strip off garbage characters that break json parser
    widget_json = req.text[4:]
    widget_dict = json.loads(widget_json)['widgets']
    # order of the json matters...
    first_region_token = True
    # assign requests
    for widget in widget_dict:
        if widget['title'] == 'Interest over time':
            self.interest_over_time_widget = widget
        if widget['title'] == 'Interest by region' and first_region_token:
            self.interest_by_region_widget = widget
            first_region_token = False
        if widget['title'] == 'Interest by subregion' and first_region_token:
            self.interest_by_region_widget = widget
            first_region_token = False
        # response for each term, put into a list
        if widget['title'] == 'Related queries':
            self.related_queries_widget_list.append(widget)
    return

def interest_over_time(self):
    """Request data from Google's Interest Over Time section and return a dataframe"""

    # make the request
    req_url = "https://www.google.co.in/trends/api/widgetdata/multiline"
    over_time_payload = dict()
    # convert to string as requests will mangle
    over_time_payload['req'] = json.dumps(self.interest_over_time_widget['request'])
    over_time_payload['token'] = self.interest_over_time_widget['token']
    over_time_payload['tz'] = self.tz
    req = self.ses.get(req_url, params=over_time_payload)

    # parse the returned json
    # strip off garbage characters that break json parser
    req_json = json.loads(req.text[5:])
    df = pd.DataFrame(req_json['default']['timelineData'])
    df['date'] = pd.to_datetime(df['time'], unit='s')
    df = df.set_index(['date']).sort_index()
    # split list columns into seperate ones, remove brackets and split on comma
    result_df = df['value'].apply(lambda x: pd.Series(str(x).replace('[', '').replace(']', '').split(',')))
    # rename each column with its search term, relying on order that google provides...
    for idx, kw in enumerate(self.kw_list):
        result_df[kw] = result_df[idx].astype('int')
        del result_df[idx]
    return result_df

def interest_by_region(self, resolution='IN'):        

    """Request data from Google's Interest by Region section and return a dataframe"""

    # make the request
    req_url = "https://www.google.com/trends/api/explore"
    region_payload = dict()
    if self.geo == 'IN':
        self.interest_by_region_widget['request']['resolution'] = resolution

    region_payload['req'] = json.dumps(self.interest_by_region_widget['request'])
    region_payload['token'] = self.interest_by_region_widget['token']
    region_payload['tz'] = self.tz
    req = self.ses.get(req_url, params=region_payload)
    print(req.text)

    req_json = json.loads(req.text[5:])

    df = pd.DataFrame(req_json['default']['geoMapData'])

    df = df[['geoName', 'value']].set_index(['geoName']).sort_index()

    result_df = df['value'].apply(lambda x: pd.Series(str(x).replace('[', '').replace(']', '').split(',')))

    for idx, kw in enumerate(self.kw_list):
        result_df[kw] = result_df[idx].astype('int')
        del result_df[idx]
    return result_df

def related_queries(self):
    """Request data from Google's Related Queries section and return a dictionary of dataframes"""

    # make the request
    req_url = "https://www.google.co.in/trends/api/widgetdata/relatedsearches"
    related_payload = dict()
    result_dict = dict()
    for request_json in self.related_queries_widget_list:
        # ensure we know which keyword we are looking at rather than relying on order
        kw = request_json['request']['restriction']['complexKeywordsRestriction']['keyword'][0]['value']
        # convert to string as requests will mangle
        related_payload['req'] = json.dumps(request_json['request'])
        related_payload['token'] = request_json['token']
        related_payload['tz'] = self.tz
        req = self.ses.get(req_url, params=related_payload)

        # parse the returned json
        # strip off garbage characters that break json parser
        req_json = json.loads(req.text[5:])
        # top queries
        top_df = pd.DataFrame(req_json['default']['rankedList'][0]['rankedKeyword'])
        top_df = top_df[['query', 'value']]
        # rising queries
        rising_df = pd.DataFrame(req_json['default']['rankedList'][1]['rankedKeyword'])
        rising_df = rising_df[['query', 'value']]
        result_dict[kw] = {'top': top_df, 'rising': rising_df}
    return result_dict

def trending_searches(self):
    """Request data from Google's Trending Searches section and return a dataframe"""

    # make the request
    req_url = "https://www.google.co.in/trends/"
    forms = {'ajax': 1, 'pn': 'p1', 'htd': '', 'htv': 'l'}
    req = self.ses.post(req_url, data=forms)
    req_json = json.loads(req.text)['trendsByDateList']
    result_df = pd.DataFrame()

    # parse the returned json
    for trenddate in req_json:
        sub_df = pd.DataFrame()
        sub_df['date'] = trenddate['date']
        for trend in trenddate['trendsList']:
            sub_df = sub_df.append(trend, ignore_index=True)
    result_df = pd.concat([result_df, sub_df])
    return result_df

def suggestions(self, keyword):
    """Request data from Google's Keyword Suggestion dropdown and return a dictionary"""

    # make the request
    kw_param = quote(keyword)
    req = self.ses.get("https://www.google.com/trends/api/autocomplete" + kw_param)

    # parse the returned json
    # response is invalid json but if you strip off ")]}'," from the front it is then valid
    req_json = json.loads(req.text[5:])['default']['topics']
    return req_json
kritideep commented 7 years ago

example.py here is the code.............. from pytrends.request import TrendReq

google_username = "" google_password = "" path = ""

Login to Google. Only need to run this once, the rest of requests will use the same session.

pytrend = TrendReq(google_username, google_password, custom_useragent='My Pytrends Script')

Create payload and capture API tokens. Only needed for interest_over_time(), interest_by_region() & related_queries()

pytrend.build_payload(kw_list=['pizza', 'bagel'])

over time interest_over_time_df = pytrend.interest_over_time() print interest_over_time_df

interest_by_region_df = pytrend.interest_by_region() print interest_by_region_df

related_queries_dict = pytrend.related_queries() print related_queries_dict

google data

trending_searches_df = pytrend.trending_searches() print trending_searches_df

Get Google Top Charts

top_charts_df = pytrend.top_charts(cid='actors', date=201611) print top_charts_df

Get Google Keyword Suggestions

suggestions_dict = pytrend.suggestions(keyword='pizza')

kritideep commented 7 years ago

But Problem arrived is json decoder arrive when i am using the india region............ please help me briefly what the reason behind this..........Thanks a lot to a big supporting

dreyco676 commented 7 years ago

You can't put the country ID in the resolution parameter. That is for determining what 'level' of information you want only the words 'COUNTRY' & 'CITY' work there.

If you use the latest version of pytrends, note that I set the geo parameter to India by using 'IN':

from pytrends.request import TrendReq

google_username = ""
google_password = ""
pytrend.build_payload(kw_list=['pizza', 'bagel'], geo='IN')
interest_by_region_df = pytrend.interest_by_region()
print interest_by_region_df

The code above will get you Province level data for India. If you want City level data you need to do the following.

interest_by_region_df = pytrend.interest_by_region(resolution='CITY')
print interest_by_region_df
dreyco676 commented 7 years ago

I'll assume that this resolved the issue since I've not heard anything in a while. Reopen if issues persist.

dominikabasaj commented 7 years ago

Hi, problem with ValueError "year is out of range" is still valid. I keep getting this error when I run: pytrend.interest_over_time() To be precise, all other methods are working (interest_by_region etc).

kritideep commented 7 years ago

HEY@ This type of error comes here..... raise SSLError(e, request=request) requests.exceptions.SSLError: hostname 'trends.google.com' doesn't match 'www.google.com'

prachidev09 commented 7 years ago

import pytrends from pytrends.request import TrendReq

google_username = "*****@gmail.com" google_password = "****" path = "C:\Python27\Lib\site-packages\pytrends"

pytrend = TrendReq(google_username, google_password, custom_useragent='My Pytrends Script') pytrends = build_payload(kw_list=[Dengue], cat=0, timeframe='today 5-y', geo='IN-MH', gprop='')

I am using above code.. But when I run it I am getting error " NameError: name 'build_payload' is not defined "

Can anyone suggest me reason for error and help me out?

Thanks a lot in advance!

jeffreywu1996 commented 7 years ago

NameError says that the function is not defined. Most likely is that you imported pytrends incorrectly.

I don't use Windows so I don't really know how to do it correctly for you. I use pip and it works fine on my mac. (http://stackoverflow.com/questions/4750806/how-do-i-install-pip-on-windows)