GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.28k stars 821 forks source link

"<1" value on google trends is being replaced with 0. #514

Open omtarful opened 2 years ago

omtarful commented 2 years ago

When I perform this query on google trends some of the values returned are "<1", however pytrends is replacing the "<1" with 0. How can I make sure that it doesn't replace the <1 with 0 ?

#This lines connect to Google Trends
from pytrends.request import TrendReq
import pandas as pd
import re
from itertools import product
import requests
import time

#Sets up language to host language
pytrends = TrendReq(hl='en-US')
#function takes as input a keyword and returns topic id
def getTopicID(word):
    #get suggested searches for word
    suggs = pytrends.suggestions(word)
    #check each suggestion and see if contains a topic
    for s in range(len(suggs)):
        #if the type of suggestion is a topic, return the topic id
        pattern = suggs[s].get("title").lower() + "(s|es|os)"
        if suggs[s].get("type") == "Topic" and (suggs[s].get("title").lower() == word.lower() or re.match(pattern, word.lower() )): 
            return(suggs[s].get("mid"))
    #returns None if there is no topic id
    return word

timeframes = '2010-01-01 2021-12-31' 
pytrends.build_payload(
            kw_list= [getTopicID("Drug"), getTopicID("green production")],
            cat = 0,
            timeframe = timeframes,
            geo = "IT",
            gprop = ""
                )
pytrends.interest_over_time()
DUOLabs333 commented 1 year ago

I ran into the same problem. Add this in request.py after

result_df = df['value'].apply(lambda x: pd.Series(
            str(x).replace('[', '').replace(']', '').split(',')))

:

formatted_df = df['formattedValue'].apply(lambda x: pd.Series(x))
        for col in formatted_df.columns:
            result_df.loc[formatted_df[col]=="<1",col]='1'
Terseus commented 1 year ago

This is difficult to implement.

"<1" is not a valid numeric value, if we store it in the pd.DataFrame we'll make processing the result significantly harder, I very much prefer to stick to a numeric type; on the other hand we're using an int type to represent the value, which I think is correct given that all the values except "<1" can be represented as integers.

Given said that, we have two options here:

I think turning them into 0 is the most reasonable approach, however I'll let this issue opened in case someone thinks on a better solution.