epfl-dlab / GoogleTrendsAnchorBank

Google Trends, made easy.
MIT License
101 stars 22 forks source link

KeyError when creating anchorbanks #11

Closed snorrefrid closed 3 years ago

snorrefrid commented 3 years ago

When building anchorbanks for certain regions and time periods, GTAB fails to execute completely due to a KeyError.

For example, when building an anchorbank for Romania in the period 2004-2020, GTAB reports the following error:

Non-continuous groups at: 
['Google', '/m/0glpjll', 'Instagram', '/m/0mgkg', 'Amazon']
['/m/02y1vz', '/m/0glpjll', 'Instagram', '/m/0mgkg', 'Amazon']

KeyError: '/m/02y1vz'

The KeyError relates to the freebase code for Facebook. Perhaps it has something to do with Facebook first coming into existence later than 2004? I have tried removing Facebook from the list of high traffic queries in the config file, but still get this error, however.


Edit from @manoelhortaribeiro:

Code to replicate the error


import gtab
t = gtab.GTAB()
t.set_options(pytrends_config={"geo": "RO", "timeframe": "2004-01-01 2020-12-31"})
t.create_anchorbank() # takes a while to run since it queries Google Trends.

`

manoelhortaribeiro commented 3 years ago

Hey @snorrefrid, sorry for the delay! I was able to replicate the issue you had and to find a quick fix:

Here's the anchor bank

google_anchorbank_geo=RO_timeframe=2004-01-01 2020-12-31.tsv.zip

I will update the code on the repo/pip soonish!

Happy easter

Ps: could you make the anchor banks that you crawled available to the community?