epfl-dlab / GoogleTrendsAnchorBank

Google Trends, made easy.
MIT License
101 stars 22 forks source link

Quota for querying Google Trends reached very quickly #13

Closed hravkin closed 3 years ago

hravkin commented 3 years ago

I'm using GTAB for a project and wanted to create an anchorbank for a particular query for all the US states. Unfortunately it reaches the quota very quickly, specifically more quickly than when GTAB established the anchorbank for the same query without a custom timeframe.

Am I doing anything wrong?

` t = gtab.GTAB(dir_path=my_path)

def plot_search_and_npi_data(query_code, country_codes): max_ratio_dic = {} for country_code in country_codes: t.set_options(pytrends_config={"geo": country_code, "timeframe": "2016-03-05 2021-05-05"}) t.create_anchorbank() t.set_active_gtab(f"google_anchorbank_geo={country_code}_timeframe=2016-03-05 2021-05-05.tsv") query_df = t.new_query(query_code) max_ratio_dic[country_code] = query_df['max_ratio'] return max_ratio_dic `

Thanks for your help and for the wonderful package!

manoelhortaribeiro commented 3 years ago

Hmm, it may be something related to the anchorbank you have generated. Can you paste the anchorbank you have generated?

Should be in /GoogleTrendsAnchorBank/gtab/output/google_anchorbanks/ (e.g., this is one we ship with the package)

hravkin commented 3 years ago

# {'GTAB': {'anchor_candidates_file': 'anchor_candidate_list.txt', 'num_anchor_candidates': 3500, 'num_anchors': 550, 'seed': 42, 'sleep': 0.5, 'thresh_offline': 10}} # {'PYTRENDS': {'geo': 'US-AL', 'timeframe': '2016-03-05 2021-05-05'}} google_query max_ratio max_ratio_lo max_ratio_hi 0 /m/02y1vz 1398.691708122582 1265.419050433503 1548.0252515813452 1 /m/09jcvs 559.4766832490328 512.4947154255686 611.4699743746314 2 /m/019rl6 212.6011396346325 197.31046543884395 229.30124039048678 3 /m/0dm32 80.78843306116035 75.96452919395492 85.98796514643254 4 /m/03vgrr 29.89172023262933 28.486698447733094 31.385607278447875 5 /m/02kr76 10.761019283746556 10.39764493342258 11.141890583848994 6 /m/05b72n 3.443526170798898 3.3792346033623386 3.509695533912433 7 /m/01myc4 2.272727272727273 2.247191011235955 2.2988505747126435 8 /m/04pv7w 1.0 1.0 1.0 9 /m/027w7rt 0.36 0.355 0.365

manoelhortaribeiro commented 3 years ago

Interesting, it seems that your main scale relies on the query "Reveillon", /m/04pv7w (a.k.a. new years). Maybe this is leading you to always having to do more queries, since this query is very seasonal and oscilates between very high and very low values...

https://trends.google.com/trends/explore?date=today%205-y&q=%2Fm%2F04pv7w

Could you try adding this entity to the blacklist (which is simply a text file) and running it again?

hravkin commented 3 years ago

Thanks for the response; I did as you suggested and it did seem to help for that particular anchorbank.

Unfortunately I reached my quota of queries when I got to the fourth anchorbank (before adding "Revellion" to the blacklist I was unable to finish the first).

These are the second, third, and fourth anchorbanks:

log_geo=US-AL_timeframe=2016-03-05 2021-05-05.txt

log_geo=US-AR_timeframe=2016-03-05 2021-05-05.txt

log_geo=US-AZ_timeframe=2016-03-05 2021-05-05.txt

manoelhortaribeiro commented 3 years ago

Hey hravkin, do you have anything else on that front?