OliverSherouse / wbdata

A python library for accessing world bank data
GNU General Public License v2.0
183 stars 54 forks source link

Is it possible to retrieve data on all countries but not aggregates? #31

Open MaxGhenis opened 5 years ago

MaxGhenis commented 5 years ago

I'd like to get data on all countries, and exclude aggregates. Is there a way to do this, e.g. with get_dataframe?

MaxGhenis commented 5 years ago

Here's my workaround: since aggregate geos are returned first, I get the index of the final aggregate geo ("World") and remove all geos with that index or lower.

Example:

df = wbdata.get_dataframe({'SP.POP.TOTL': 'pop'}).reset_index()
geos = pd.Series(df.country.unique())
world_index = geos[geos == 'World'].index[0]
aggs = geos[:world_index+1]
df[~df.country.isin(aggs)].head()
country date pop
Afghanistan 2018 NaN
Afghanistan 2017 35530081.0
Afghanistan 2016 34656032.0
Afghanistan 2015 33736494.0
Afghanistan 2014 32758020.0
OliverSherouse commented 5 years ago

Not at the moment, that's how the WB API handles things. I suppose we could build that in manually without too much trouble by indicating a special code that means "actually just countries". Or we could have a constant. The difficulty there is that we'd ideally want to be able to identify which "countries" are aggregates at runtime. I'll noodle on that.

OliverSherouse commented 4 years ago

Another workaround is to use [i for i in wbdata.get_country() if not i['incomeLevel']['value'] == "Aggregates"]; that seems to be fairly comprehensive. I'll consider adding that as a utility in the next version.