CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
316 stars 123 forks source link

LitPop error for some countries yielding UnboundLocalError: local variable 'country_name' referenced before assignment #75

Open simonameiler opened 4 years ago

simonameiler commented 4 years ago

Creating LitPop-entities yields an error for some countries. Upon calling the following lines of code I obtained the output pasted below:

# Import required packages:
import numpy as np
import pandas as pd
from matplotlib import colors
from iso3166 import countries as iso_cntry
from climada.entity import LitPop

# Initiate GDP-Entity for Tanzania, Rwanda, and Burundi:
ent = LitPop()
countries_list = ['ABW', 'AIA', 'ANT']
ent.set_country(countries_list, res_arcsec=120, reference_year=2014, fin_mode='gdp')
ent.set_geometry_points()
2020-10-13 14:01:57,431 - climada - DEBUG - Loading default config file: /Users/simonameiler/Documents/WCR/CLIMADA_develop/climada_python/climada/conf/defaults.conf
/Users/simonameiler/opt/anaconda3/envs/climada_env/lib/python3.7/site-packages/pandas_datareader/compat/__init__.py:7: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  from pandas.util.testing import assert_frame_equal
Traceback (most recent call last):

  File "<ipython-input-2-f4f6a39a9cf7>", line 11, in <module>
    ent.set_country(countries_list, res_arcsec=120, reference_year=2014, fin_mode='gdp')

  File "/Users/simonameiler/Documents/WCR/CLIMADA_develop/climada_python/climada/entity/exposures/litpop.py", line 183, in set_country
    _get_country_info(country_new)

  File "/Users/simonameiler/Documents/WCR/CLIMADA_develop/climada_python/climada/entity/exposures/litpop.py", line 956, in _get_country_info
    cntry_info = [iso_num, country_name, country_shp]

UnboundLocalError: local variable 'country_name' referenced before assignment

It seems that the error might be related to the input rather than the function itself. In the above example, it's 'ANT' which causes the problem. Has anyone encountered similar problems using LitPop or does anyone have an idea of what the problem is? Maybe changes in the util.finance or util.coordinates cause the problem?

chahank commented 4 years ago

Have you looked into the code of LitPop to spot the point where the assignment happens using a debugger? You can execute line by line and find where the problem happens.

mmyrte commented 4 years ago

I ran into this problem at some point; since I wanted to do the whole world, my solution was pragmatic and I just skipped the ISO_A3 codes that couldn't be found in the natural earth data. If you're going to fix the code, just tear it out and replace those terrifying iterators with geopandas.

If you're short on time then you could just log an error and return None, which would be better than what it's doing now. The problem is that the if statement on line 937 never evaluates to true if the ISO code is not found in the data.

https://github.com/CLIMADA-project/climada_python/blob/77e3880743fe8e49d15320a6a29692d175bec892/climada/entity/exposures/litpop.py#L906-L958

chahank commented 4 years ago

I am trying to fix this currently. So, what do you think should be the behaviour if the iso3 code is not found?

sameberenz commented 4 years ago

maybe skip the country, and log the failed iso3-codes somewhere? when I prepared global LitPop data, I wrote a script returning me a list of all failed codes / countries.

mmyrte commented 4 years ago

Yeah, I'd just use LOGGER.warning if a code is skipped. In case the user has a need for a list of failed codes, it's manageable to just copy & paste that.

chahank commented 4 years ago

Is there possibly a fallback solution for countries that are not recognized? For example could they use only lit or only pop data?

mmyrte commented 4 years ago

Yeah, the natural earth table has a lot more IDs than just iso_a3; you could switch it over to adm0_a3; here's the rows where the two don't match: (maybe just put in a selector for which geodataframe col to use)

select admin, adm0_a3, iso_a3 from ne_10m_admin0_t where adm0_a3 <> iso_a3;

admin adm0_a3 iso_a3
Dhekelia Sovereign Base Area ESB -99
Palestine PSX PSE
South Sudan SDS SSD
Somaliland SOL -99
France FRA -99
Western Sahara SAH ESH
Norway NOR -99
Kosovo KOS -99
US Naval Base Guantanamo Bay USG -99
Northern Cyprus CYN -99
Baykonur Cosmodrome KAB -99
Cyprus No Mans Area CNM -99
Siachen Glacier KAS -99
Akrotiri Sovereign Base Area WSB -99
Ashmore and Cartier Islands ATC -99
Bajo Nuevo Bank (Petrel Is.) BJN -99
Serranilla Bank SER -99
Aland ALD ALA
Clipperton Island CLP -99
Indian Ocean Territories IOA -99
Scarborough Reef SCR -99
Coral Sea Islands CSI -99
Spratly Islands PGA -99

ps: God their data is shitty, NOR without ISO? Their repo has PRs from five years ago, as far as I remember.

chahank commented 4 years ago

neither Norway nor France... that is very weird.

mmyrte commented 4 years ago

By the way, this sorta references the arbitrary region_ids currently mapped in the constants file, which I am afraid of touching, because it relies on multiple positional indexing. If the NE data get updated, it possibly goes belly-up.

zedrdave commented 3 years ago

I think part of the weirdness in the data (eg NOR and FRA) comes from the existence of unincorporated/disputed areas: France has Mayotte (and possibly others), Norway has Svalbard… Resulting in ISO oddities and other headaches:

I suspect the iso_a3 value of -99 is the result of faulty DB normalisation somewhere in NE tables… Possibly fixable by looking at other NE tables (or other sources, like Geonames).