CartoDB / cartodb

Location Intelligence & Data Visualization tool
http://carto.com
BSD 3-Clause "New" or "Revised" License
2.75k stars 650 forks source link

Incorrect Geocoding of ISO-2 country code for Bahrain/Bosnia & Herzegovina #6371

Closed michellechandra closed 8 years ago

michellechandra commented 8 years ago

@iriberri
S/B 7826854

Context

Geocoding based on ISO-2 Country code returns incorrect result for Bahrain.

Process

Upload or create a dataset in CartoDB using Country Code for Bahrain (BH) and Bosnia & Herzegovina (BA). Geocode the dataset based on Admin Region using the column containing the ISO 2 country codes. Resulting polygon for Bahrain will be for Bosnia.

See example dataset here: https://team.cartodb.com/u/chandra/tables/iso_country_code_test/table

Expected Result

Country code for Bahrain (BH) should result in polygon for Bahrain, not polygon for Bosnia & Herzegovina (BA).

Current Result

Country code for Bahrain (BH) returns polygon for Bosnia & Herzegovina (BA).

See earlier issue closed last year: https://github.com/CartoDB/cartodb/issues/1399

http://en.wikipedia.org/wiki/ISO_3166-2:BH

http://en.wikipedia.org/wiki/ISO_3166-2:BA

iriberri commented 8 years ago

The cause of the issue comes from: https://github.com/CartoDB/data-services/blob/cb4b6411a108414ebad52ef839bef253d4915bd3/geocoder/admin0/sql/build_synonym_table.sql#L80

image

The function to geocode by countries is not taking into account the rank of the data. In principle it is assumed that original data needs to be consistent, but the function is not neither getting "the best" result: https://github.com/CartoDB/data-services/blob/cb4b6411a108414ebad52ef839bef253d4915bd3/geocoder/extension/sql/0.0.1/20_admin0.sql#L11

If we are supporting ISO2 codes, we probably don't want any "2 character country name synonym" rather than that. So it could be interesting from our end to just add synonyms to country names when this synonym is indeed longer than 2 characters (after normalizing the characters of the string).

iriberri commented 8 years ago

Fixed in production.