IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
220 stars 115 forks source link

Migrate worldbank datareader to wbdata #831

Open glatterf42 opened 3 months ago

glatterf42 commented 3 months ago

Possible related to #815.

827 migrates from pandas-datareader to wbdata and implements some changes needed for that.

glatterf42 commented 3 months ago

@danielhuppmann Feel free to close this whenever you consider this migration done. Maybe opening this issue was never needed in the first place since #827 already migrated the read_worldbank() function.

danielhuppmann commented 3 months ago

Yes, my bad - I didn't get that you already fixed the issue in your PR. But let's leave this topic open anyway as a reminder to revisit the WorldBank-integration feature and see if the unit-issue can be fixed.

glatterf42 commented 3 months ago

As it is, I'm not sure there is an officially supported way of retrieving the unit from the WorldBank data. For example:

>>> indicator = "NY.GDP.PCAP.PP.KD"
>>> new = wbdata.get_dataframe(indicators={indicator: "GDP"},country=["CAN", "MEX", "USA"],date=("2003", "2005"))
>>> new
                             GDP
country       date              
Canada        2005  44683.764981
              2004  43704.669134
              2003  42791.094678
Mexico        2005  19144.014627
              2004  19017.753814
              2003  18634.896456
United States 2005  54331.658336
              2004  52989.030694
              2003  51497.734688
>>> new.index
MultiIndex([(       'Canada', '2005'),
            (       'Canada', '2004'),
            (       'Canada', '2003'),
            (       'Mexico', '2005'),
            (       'Mexico', '2004'),
            (       'Mexico', '2003'),
            ('United States', '2005'),
            ('United States', '2004'),
            ('United States', '2003')],
           names=['country', 'date'])
>>> 
>>> new.columns
Index(['GDP'], dtype='object')
>>> result = wbdata.get_indicators(indicator)
>>> result
id                 name
-----------------  ---------------------------------------------------
NY.GDP.PCAP.PP.KD  GDP per capita, PPP (constant 2017 international $)
>>> raw = wbdata.get_data(indicator, country=["CAN", "MEX", "USA"],date=("2003","2005"))
>>> raw
[{'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'CA', 'value': 'Canada'}, 'countryiso3code': 'CAN', 'date': '2005', 'value': 44683.764981042, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'CA', 'value': 'Canada'}, 'countryiso3code': 'CAN', 'date': '2004', 'value': 43704.6691337093, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'CA', 'value': 'Canada'}, 'countryiso3code': 'CAN', 'date': '2003', 'value': 42791.0946777734, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'MX', 'value': 'Mexico'}, 'countryiso3code': 'MEX', 'date': '2005', 'value': 19144.014627364, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'MX', 'value': 'Mexico'}, 'countryiso3code': 'MEX', 'date': '2004', 'value': 19017.7538141902, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'MX', 'value': 'Mexico'}, 'countryiso3code': 'MEX', 'date': '2003', 'value': 18634.8964558406, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'US', 'value': 'United States'}, 'countryiso3code': 'USA', 'date': '2005', 'value': 54331.6583361399, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'US', 'value': 'United States'}, 'countryiso3code': 'USA', 'date': '2004', 'value': 52989.0306944184, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2017 international $)'}, 'country': {'id': 'US', 'value': 'United States'}, 'countryiso3code': 'USA', 'date': '2003', 'value': 51497.7346884645, 'unit': '', 'obs_status': '', 'decimal': 0}]

What we'd probably want to have as the unit, 'PPP (constant 2017 international $)', is part of 'name' or 'value' and the 'unit' key that exists for raw data is empty. Might be worth opening an issue with https://github.com/OliverSherouse/wbdata/tree/master if this is a feature we want to see.