elyase / geotext

Geotext extracts country and city mentions from text
MIT License
135 stars 47 forks source link

'UK' is in country mentions #15

Open VanessaVanG opened 6 years ago

VanessaVanG commented 6 years ago

"The official ISO country code for the United Kingdom is 'GB'. The code 'UK' is reserved." Both UK and GB are returned in my country mentions for some reason. I'm not even sure what the UK ones are from. (I'm using this on a huge file so there's no way to tell what places it's deeming as UK)

hernandezrivera commented 6 years ago

I had the same issue. Getting 'GB' for United Kingdom and when trying to get the country name with pycountry, it doesn't recognize it, so I have to do the change manually to 'GB' case by case. It would be good to have United Kingdom returning 'GB' instead of 'GB'. List of iso2 codes

Seeing the txt with the list of countries, it seems that it is using the fips code instead of the iso2. Also, there are some other countries that have a different fips than iso2 (fips vs iso list=, so this issue impacts other countries too.

I don't understand how it happens as it seems geotext.py is grabbing the proper columns of the file:

    countries = read_table(
        get_data_path('countryInfo.txt'), usecols=[4, 0], skip=1)
albertc1 commented 5 years ago

I ran into this problem too, as well as a few others. Looks like it was coming from nationalities.txt: https://github.com/elyase/geotext/pull/20