GeneralMills / pytrends

Pseudo API for Google Trends
Other
3.12k stars 798 forks source link

Historical data including data from the last week does not align with earlier data #576

Open shkarlsson opened 1 year ago

shkarlsson commented 1 year ago

Historical data that include data from the last week seems to not be aligned with earlier data (i.e. one goes up while the other ones go down in at least one datapoint). Below is a minimum code to confirm that df1 and df2 (not containing data from the last week) indeed do align, but df2 and df3 does not.

Is there a way to get data for the last week that is consistent with historical data?

term = 'popcorn'
pytrends = TrendReq(hl='en-US', tz=0, retries=100)

range_str1 = '2023-03-25T08 2023-04-02T07'
pytrends.build_payload([term], cat=0, timeframe=range_str1)
df1 = pytrends.interest_over_time()

range_str2 = '2023-03-29T08 2023-04-06T07'
pytrends.build_payload([term], cat=0, timeframe=range_str2)
df2 = pytrends.interest_over_time()

range_str3 = '2023-04-03T08 2023-04-10T07'
pytrends.build_payload([term], cat=0, timeframe=range_str3)
df3 = pytrends.interest_over_time()

def dfs_align(df1, df2):
    # This function checks if the two dataframes always go in the same directions during the same timestamp. Overlapping data should never go in the opposite direction. This would indicate that google is sending bogus data for at least one of the datasets.

    term = df1.columns[0]
    intersect = df1.index.intersection(df2.index)
    signs1 = df1.loc[intersect, term].diff().iloc[1:].apply(np.sign)
    signs2 = df2.loc[intersect, term].diff().iloc[1:].apply(np.sign)

    return ((signs1 - signs2).abs() < 2).all()

print(dfs_align(df1.loc[:, [term]], df2.loc[:, [term]]))

print(dfs_align(df2.loc[:, [term]], df3.loc[:, [term]]))
shkarlsson commented 1 year ago

I have been a bit spammy with the api as of late so if someone could reproduce this from a fresh IP/machine that would be appreciated.