cybergreen-net / refdata-country

Reference dataset for countries
0 stars 0 forks source link

Reference data is missing countries #3

Open zelima opened 7 years ago

zelima commented 7 years ago

There are countries in scanned data that are not included in reference data for countries. Currently they are added manually by me.

This should to be automated somehow

zelima commented 7 years ago

Examples

AQ,Antarctica,antarctica,Antarctica,Antarctica 
C,Unknown,unknown,, 
CF,Central African Republic,central-african-republic,, 
EU,Europe,europe,, 
FK,"Falkland Islands (Malvinas)",falkland-islands,Americas,Amricas 
GS,South Georgia and South Sandwich Islands,south-georgia-and-south-sandwich-islands,Americas,Americas 
GW,Guinea-Bissau,guinea-bissau,Africa,Africa 
IO,British Indian Ocean Territory,british-indian-ocean-territory,Asia,Asia
NF,Norfolk Island,norfolk-island,Oceania,Oceania 
NR,Nauru,nauru,Africa,Africa
NU,Niue,niue,Oceania,Oceania
PW,Palau,palau,Oceania,Oceania
TK,Tokelau,tokelau,Oceania,Oceania
TV,Tuvalu,tuvalu,Oceania,Oceania
UM,"U.S. Outlying Islands",us-outlying-slands,Americas,Americas
VA,Vatican City,vatican-city,Europe,Europe
WF,Wallis and Futuna Islands,Wallis-and-futuna-islands,Territory of FR,Europe
XY,Unknown XY,unknown-xy,,
YT,Mayotte,mayotte,Africa,Afcrica
ZZ,-Reserved AS-,reserved-as,,
aaronkaplan commented 7 years ago

I believe the country code -> country name mapping came from Atomatic. So, yes, please update as needed. Thanks.

zelima commented 7 years ago

cc @rufuspollock Currently we use this dataset https://github.com/datasets/country-codes/ for getting data.

We need to think what to do if countries from scanned data not present in that dataset https://raw.githubusercontent.com/datasets/country-codes/master/data/country-codes.csv

rufuspollock commented 7 years ago

@zelima first of all some items are definitely not countries 😉 e.g. Antartica is a region or continent!

Some of the rest are places that are islands that are part of a larger geopolitical entity. E.g. Falkland Islands are legally a part of Britain (I believe!).

It is a "business" question as to how we treat these. @aaronkaplan any thoughts?

I believe the country code -> country name mapping came from Atomatic. So, yes, please update as needed. Thanks.

@aaronkaplan generally this is not a country -> code issue as an issue that codes are being assigned during the enrichment that don't exist as ISO 2 digit codes ...

aaronkaplan commented 7 years ago

On 27 Jan 2017, at 06:55, Rufus Pollock notifications@github.com wrote:

@zelima first of all some items are definitely not countries 😉 e.g. Antartica is a region or continent!

Some of the rest are places that are islands that are part of a larger geopolitical entity. E.g. Falkland Islands are legally a part of Britain (I believe!).

It is a "business" question as to how we treat these. @aaronkaplan any thoughts?

The country codes that come out of the enrichment are "maxmind extended country codes". That means: A1 = satellite or so. So basically we have to stick with that.

I believe the country code -> country name mapping came from Atomatic. So, yes, please update as needed. Thanks.

@aaronkaplan generally this is not a country -> code issue as an issue that codes are being assigned during the enrichment that don't exist as ISO 2 digit codes ...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

zelima commented 7 years ago

I understand that they are not countries, but still they come up in scanned data.

@rufuspollock @aaronkaplan I'm not quite clear about the solution. should we switch to other source? Eg: http://dev.maxmind.com/geoip/legacy/codes/iso3166/

rufuspollock commented 7 years ago

@zelima i think we need to create a reference dataset with these items. However, it will present problems if we ever want to do charting as most of the charts will want country ISO codes.

Need to think about a solution here and not immediately obvious.

aaronkaplan commented 7 years ago

I think ISO codes are nearly 100% identical to maxmind. Except for "GB" versus "UK" and the A1 and A2 satellite providers


Mobile

On 28 Jan 2017, at 12:29, Rufus Pollock notifications@github.com wrote:

@zelima i think we need to create a reference dataset with these items. However, it will present problems if we ever want to do charting as most of the charts will want country ISO codes.

Need to think about a solution here and not immediately obvious.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.