epfl-dlab / GoogleTrendsAnchorBank

Google Trends, made easy.
MIT License
101 stars 22 forks source link

Create Anchorbank #7

Closed erwanlenagard closed 4 years ago

erwanlenagard commented 4 years ago

Hello,

I tried to create a new anchor bank, but It returned this error. How can i solve this ?

AttributeError Traceback (most recent call last)

in 6 t.set_options(pytrends_config={"geo": geo, "timeframe": timeframe}) 7 print("create anchorbank") ----> 8 t.create_anchorbank() 9 print("set active gtab") 10 t.set_active_gtab("google_anchorbank_geo="+geo+"_timeframe="+timeframe+".tsv") C:\Anaconda3\lib\site-packages\gtab\core.py in create_anchorbank(self, verbose, keep_diagnostics) 801 self.time_series = time_series 802 --> 803 ratios = self._compute_max_ratios(google_results) 804 if keep_diagnostics: 805 self.ratios = ratios C:\Anaconda3\lib\site-packages\gtab\core.py in _compute_max_ratios(self, google_results) 438 continue 439 --> 440 if self._check_ts(val.iloc[:, j]) and self._check_ts(val.iloc[:, k]): 441 anchors.append(val.columns[0]) # first element of the group 442 v1.append(val.columns[j]) C:\Anaconda3\lib\site-packages\gtab\core.py in _check_ts(self, ts) 144 145 def _check_ts(self, ts): --> 146 return ts.max().max() >= self.CONFIG['GTAB']['thresh_offline'] 147 148 def _find_nans(self, W0): **AttributeError: 'int' object has no attribute 'max'**
GorjanP commented 4 years ago

Hi! Can you please paste the whole snippet of code so I can reproduce the function call that caused the exception? Can you also give me the versions of your python packages (most notably pandas and numpy)?

erwanlenagard commented 4 years ago

Hello,

Thanks a lot ! Here are packages versions : pandas==1.1.2 numpy==1.19.2

And my code :

import os import pandas as pd import gtab

projet = "monprojet" file="C:/Users/Erwan/Documents/scripts/scripts_python/Google Trends/input.csv" path="C:/Users/Erwan/Documents/scripts/scripts_python/Google Trends/"+projet

geo="FR" date_debut="2020-03-05" date_fin="2020-05-05"

if not os.path.exists(path): os.makedirs(path)

df_input=pd.read_csv(file, sep=';',index_col=None, encoding = "utf-8")

t = gtab.GTAB() timeframe=date_debut+" "+date_fin print("set options") t.set_options(pytrends_config={"geo": geo, "timeframe": timeframe}) print("create anchorbank") t.create_anchorbank() print("set active gtab") t.set_active_gtab("google_anchorbank_geo="+geo+"_timeframe="+timeframe+".tsv") print("requete DF") results=pd.DataFrame() for i,row in df_input.iterrows(): result_tmp=t.new_query(row["query"]) result_tmp["query"]=row["query"] result_tmp.reset_index(inplace=True) results=pd.concat([results,result_tmp],ignore_index=True,sort=True)

names=results.columns results.to_csv(path+"/"+projet+"_resultats.csv",header=names, sep=';',encoding='utf-8',index=False, decimal="," )

GorjanP commented 4 years ago

Hi Erwan! The bug was caused because for some reason pd.DataFrame.max() didn't return a numpy.int32 object, but instead returned a simple int object. I'm not sure why it doesn't use numpy types on your python config.

In any case, I've hopefully fixed the bug, please test it out and close the issue if it works now. :) Thanks for the bug catch!

erwanlenagard commented 4 years ago

Thanks a lot ! It works ! :)