IndEcol / country_converter

The country converter (coco) - a Python package for converting country names between different classification schemes.
GNU General Public License v3.0
216 stars 72 forks source link

Non-standard codes #25

Closed rgieseke closed 6 years ago

rgieseke commented 7 years ago

After #24 I wanted to compare country_converter (which I use a lot as coco) - with pycountry which covers only ISO3 codes:

import country_converter
import pycountry
data = pd.read_table(country_converter.COUNTRY_DATA_FILE, sep='\t', encoding='utf-8')
for _, (code, name) in data[['ISO3', 'name_short']].iterrows():
    try:
        pycountry.countries.get(alpha_3=code).name
    except KeyError:
        print(code, name)

This gives these (non-standard) codes:

BA1 British Antarctic Territories
CHI Channel Islands
KSV Kosovo
ANT Netherlands Antilles
EAT Tanganjika
EAZ Zanzibar

Not sure whether these are partly former ones. For Kosovo XK, and XKK seem to be used as placeholders: https://geonames.wordpress.com/2010/03/08/xk-country-code-for-kosovo/

Maybe it's worth making it explicit in the docs that codes are amended.

konstantinstadler commented 7 years ago

Not quite sure how I should handle these. The main idea of coco was to use them to convert data found in international databases, in which these are (still) found. However, some of these codes are subject to change or not consistent across databases. The only workable solution I could think of was to provide the possibility to use a additional country file if needed. The countries specified in this file overwrite existing country matchings. See readme (cli usage) and tutorial (around IN[13] for python)

rgieseke commented 7 years ago

Sure - I just thought it might be useful to point this out in the docs, e.g. something like

"ISO3 (or ISO2) covers the ISO 3xxx codes plus the additional codes ..."

There are also four letter codes for dissolved countries: https://en.wikipedia.org/wiki/ISO_3166-3#Current_codes

It just might be confusing if for British Antarctic Territories a custom code BA1 is used, and for the Netherlands Antilles the formerly used ANT. If it's somehow stated then

konstantinstadler commented 7 years ago

Perhaps the best solution would be to only cover the "official" or "standard" ISO3 codes and provide some ready-made additional country files covering former/reserved codes.

konstantinstadler commented 6 years ago

Fixed in version 0.6.0: CountryConverter accepts a parameter only_UNmember to restrict the concordances to UN member countries + Further documentation about the codes in the README - Classification schemes