Closed hravkin closed 3 years ago
Hmm, it may be something related to the anchorbank you have generated. Can you paste the anchorbank you have generated?
Should be in /GoogleTrendsAnchorBank/gtab/output/google_anchorbanks/
(e.g., this is one we ship with the package)
# {'GTAB': {'anchor_candidates_file': 'anchor_candidate_list.txt', 'num_anchor_candidates': 3500, 'num_anchors': 550, 'seed': 42, 'sleep': 0.5, 'thresh_offline': 10}} # {'PYTRENDS': {'geo': 'US-AL', 'timeframe': '2016-03-05 2021-05-05'}} google_query max_ratio max_ratio_lo max_ratio_hi 0 /m/02y1vz 1398.691708122582 1265.419050433503 1548.0252515813452 1 /m/09jcvs 559.4766832490328 512.4947154255686 611.4699743746314 2 /m/019rl6 212.6011396346325 197.31046543884395 229.30124039048678 3 /m/0dm32 80.78843306116035 75.96452919395492 85.98796514643254 4 /m/03vgrr 29.89172023262933 28.486698447733094 31.385607278447875 5 /m/02kr76 10.761019283746556 10.39764493342258 11.141890583848994 6 /m/05b72n 3.443526170798898 3.3792346033623386 3.509695533912433 7 /m/01myc4 2.272727272727273 2.247191011235955 2.2988505747126435 8 /m/04pv7w 1.0 1.0 1.0 9 /m/027w7rt 0.36 0.355 0.365
Interesting, it seems that your main scale relies on the query "Reveillon", /m/04pv7w (a.k.a. new years). Maybe this is leading you to always having to do more queries, since this query is very seasonal and oscilates between very high and very low values...
https://trends.google.com/trends/explore?date=today%205-y&q=%2Fm%2F04pv7w
Could you try adding this entity to the blacklist (which is simply a text file) and running it again?
Thanks for the response; I did as you suggested and it did seem to help for that particular anchorbank.
Unfortunately I reached my quota of queries when I got to the fourth anchorbank (before adding "Revellion" to the blacklist I was unable to finish the first).
These are the second, third, and fourth anchorbanks:
log_geo=US-AL_timeframe=2016-03-05 2021-05-05.txt
Hey hravkin, do you have anything else on that front?
I'm using GTAB for a project and wanted to create an anchorbank for a particular query for all the US states. Unfortunately it reaches the quota very quickly, specifically more quickly than when GTAB established the anchorbank for the same query without a custom timeframe.
Am I doing anything wrong?
` t = gtab.GTAB(dir_path=my_path)
def plot_search_and_npi_data(query_code, country_codes): max_ratio_dic = {} for country_code in country_codes: t.set_options(pytrends_config={"geo": country_code, "timeframe": "2016-03-05 2021-05-05"}) t.create_anchorbank() t.set_active_gtab(f"google_anchorbank_geo={country_code}_timeframe=2016-03-05 2021-05-05.tsv") query_df = t.new_query(query_code) max_ratio_dic[country_code] = query_df['max_ratio'] return max_ratio_dic `
Thanks for your help and for the wonderful package!