Closed iGEL closed 2 years ago
@iGEL I think we have this covered via our translations, which are much much more "common".
Chipping in here, but the effect of this change has been noticeable across our apps. I do not think that saying just use the translations is fair. It will be quite a task for us to go and update everything to pull out the short name out from a translations hash. I think @iGEL's suggestion is perfectly fair and reasonable.
Hm I see your point but I am more than happy to change the name of name
to iso_name
but I don't think it's a good idea to add something like name
back in as it becomes WHICH name from WHICH perspective. And translations are the CORRECT way this problem should be solved.
That being said, do we need to make the translations API more approachable? I could see a method like name
just call out to i18n to get the current locale and load that translation? With a fall back to a default locale of en
if we are not inside of rails?
From a usage perspective, I would benefit from having a single method to get the common country name that users expect.
ISO3166::Country.all.map(&:name)
gives me the extremely long names that @iGEL mentioned, with the UK name being 52 characters longISO3166::Country.all.map { |c| c.translations['en'] }
gives me "Viet Nam", which is the official way of doing it, but my users don't want that in a dropdown. It also includes "Korea, Republic of" and "Korea, Democratic People's Republic of". I can't expect my users to know the difference.ISO3166::Country.all.map { |c| c.unofficial_names[0] }
is definitely the closest, with only a small number of unexpected names. For some reason KP
is "Korea (North)", whereas KR
is "South Korea".I can sympathize with not having a definitive source, though. Wikipedia uses "United States Virgin Islands" instead of unofficial_names
"Virgin Islands of the United States", but Google Maps uses the more compact "US Virgin Islands" on its display. If it means spinning off another gem to store ISO3166::Country.new('US').additional_names["wikipedia"]["en"]
, I can accept that. Or if there's already a solution out there, please let me know.
@perspectivezoom You bring up the fundamental problem which is this gem has to straddle a social-economic problem, and there are only "okay" choices.
ISO is too long. And odd around disputed countries Translations... inconsistent based upon language. Slang... well is slang.
I am 100% for bringing any data into the data set, so no need for another gem. But I am not 100% understanding your reference? are you referencing to the title of one wiki article or some other list.
I personally have spent a lot of time looking at Open Street Map as a data source but it's tagging system as proved way too inconsistent for any structure data retrieval.
Here is a thought.
name
or
accepted_name
returns the current locales, Wikipedia title for that country in the respective language This list of the ISO codes, provides a reasonable scrapable data source, while the subtitle of each page is a much more reasonable name.
This echo's @perspectivezoom's thoughts but has a data source. Any other ideas?
I am not 100% understanding your reference? are you referencing to the title of one wiki article or some other list.
Sorry for not being clear. If I type in the underscored version of "Virgin Islands of the United States" into https://en.wikipedia.org, so (https://en.wikipedia.org/wiki/Virgin_Islands_of_the_United_States), it redirects me to https://en.wikipedia.org/wiki/United_States_Virgin_Islands. Looks like that particular config is accessible via this page: https://en.wikipedia.org/w/index.php?title=Virgin_Islands_of_the_United_States&redirect=no
Ironically, the suggested data source, https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 has the "Country name" text for VI
as "Virgin Islands, U.S.", though the link takes you to https://en.wikipedia.org/wiki/United_States_Virgin_Islands. Both formats are acceptable to me, though I might favor the link version, because "Viet Nam".
I too am open to other ideas for a data source.
I played around with scraping the wikipedia titles. Here's a quick and dirty Nokogiri script I came up with. open-uri
's open
follows wikipedia's redirects correctly, so even if the link becomes outdated, it should correctly grab the accepted name. It works for a lot of locales, (I'm including arabic ar
, english en
, italian it
, and chinese zh
), but not others. German de
has their countries in a completely separate article, and french fr
has some other links in their table that don't have a good distinguishing heuristic
require 'open-uri'
require 'nokogiri'
require 'countries'
ALL_ALPHA2S = ISO3166::Country.all.map(&:alpha2)
COUNTRY_COUNT = 249
LOCALES = %w(ar en it zh)
data = LOCALES.map do |locale|
root = "https://#{locale}.wikipedia.org"
doc = Nokogiri::HTML(open("#{root}/wiki/ISO_3166-1"))
next unless doc
content_tables = doc.css("div#bodyContent table.wikitable")
country_table = content_tables.detect { |t| (t.css("tr").length - 1) == COUNTRY_COUNT }
next unless country_table
alpha2_and_article_title_tuples = country_table.css("tr")[1..-1].map do |tr|
alpha2 = tr.css("td").map(&:text).detect { |text| ALL_ALPHA2S.include? text }
hopefully_country_path = tr.css("a").map { |a| a['href'] }.detect do |href|
!href.include?('3166') && !href.include?('Flag')
end
country_doc = Nokogiri::HTML(open("#{root}#{hopefully_country_path}"))
article_title = country_doc.css("h1#firstHeading").text
[alpha2, article_title]
end
alpha2_and_article_title_tuples
end
Output for en
. Please verify yourself:
[["AF", "Afghanistan"], ["AX", "Åland Islands"], ["AL", "Albania"], ["DZ", "Algeria"], ["AS", "American Samoa"], ["AD", "Andorra"], ["AO", "Angola"], ["AI", "Anguilla"], ["AQ", "Antarctica"], ["AG", "Antigua and Barbuda"], ["AR", "Argentina"], ["AM", "Armenia"], ["AW", "Aruba"], ["AU", "Australia"], ["AT", "Austria"], ["AZ", "Azerbaijan"], ["BS", "The Bahamas"], ["BH", "Bahrain"], ["BD", "Bangladesh"], ["BB", "Barbados"], ["BY", "Belarus"], ["BE", "Belgium"], ["BZ", "Belize"], ["BJ", "Benin"], ["BM", "Bermuda"], ["BT", "Bhutan"], ["BO", "Bolivia"], ["BQ", "Caribbean Netherlands"], ["BA", "Bosnia and Herzegovina"], ["BW", "Botswana"], ["BV", "Bouvet Island"], ["BR", "Brazil"], ["IO", "British Indian Ocean Territory"], ["BN", "Brunei"], ["BG", "Bulgaria"], ["BF", "Burkina Faso"], ["BI", "Burundi"], ["CV", "Cape Verde"], ["KH", "Cambodia"], ["CM", "Cameroon"], ["CA", "Canada"], ["KY", "Cayman Islands"], ["CF", "Central African Republic"], ["TD", "Chad"], ["CL", "Chile"], ["CN", "China"], ["CX", "Christmas Island"], ["CC", "Cocos (Keeling) Islands"], ["CO", "Colombia"], ["KM", "Comoros"], ["CG", "Republic of the Congo"], ["CD", "Democratic Republic of the Congo"], ["CK", "Cook Islands"], ["CR", "Costa Rica"], ["CI", "Ivory Coast"], ["HR", "Croatia"], ["CU", "Cuba"], ["CW", "Curaçao"], ["CY", "Cyprus"], ["CZ", "Czech Republic"], ["DK", "Denmark"], ["DJ", "Djibouti"], ["DM", "Dominica"], ["DO", "Dominican Republic"], ["EC", "Ecuador"], ["EG", "Egypt"], ["SV", "El Salvador"], ["GQ", "Equatorial Guinea"], ["ER", "Eritrea"], ["EE", "Estonia"], ["ET", "Ethiopia"], ["FK", "Falkland Islands"], ["FO", "Faroe Islands"], ["FJ", "Fiji"], ["FI", "Finland"], ["FR", "France"], ["GF", "French Guiana"], ["PF", "French Polynesia"], ["TF", "French Southern and Antarctic Lands"], ["GA", "Gabon"], ["GM", "The Gambia"], ["GE", "Georgia (country)"], ["DE", "Germany"], ["GH", "Ghana"], ["GI", "Gibraltar"], ["GR", "Greece"], ["GL", "Greenland"], ["GD", "Grenada"], ["GP", "Guadeloupe"], ["GU", "Guam"], ["GT", "Guatemala"], ["GG", "Guernsey"], ["GN", "Guinea"], ["GW", "Guinea-Bissau"], ["GY", "Guyana"], ["HT", "Haiti"], ["HM", "Heard Island and McDonald Islands"], ["VA", "Vatican City"], ["HN", "Honduras"], ["HK", "Hong Kong"], ["HU", "Hungary"], ["IS", "Iceland"], ["IN", "India"], ["ID", "Indonesia"], ["IR", "Iran"], ["IQ", "Iraq"], ["IE", "Republic of Ireland"], ["IM", "Isle of Man"], ["IL", "Israel"], ["IT", "Italy"], ["JM", "Jamaica"], ["JP", "Japan"], ["JE", "Jersey"], ["JO", "Jordan"], ["KZ", "Kazakhstan"], ["KE", "Kenya"], ["KI", "Kiribati"], ["KP", "North Korea"], ["KR", "South Korea"], ["KW", "Kuwait"], ["KG", "Kyrgyzstan"], ["LA", "Laos"], ["LV", "Latvia"], ["LB", "Lebanon"], ["LS", "Lesotho"], ["LR", "Liberia"], ["LY", "Libya"], ["LI", "Liechtenstein"], ["LT", "Lithuania"], ["LU", "Luxembourg"], ["MO", "Macau"], ["MK", "Republic of Macedonia"], ["MG", "Madagascar"], ["MW", "Malawi"], ["MY", "Malaysia"], ["MV", "Maldives"], ["ML", "Mali"], ["MT", "Malta"], ["MH", "Marshall Islands"], ["MQ", "Martinique"], ["MR", "Mauritania"], ["MU", "Mauritius"], ["YT", "Mayotte"], ["MX", "Mexico"], ["FM", "Federated States of Micronesia"], ["MD", "Moldova"], ["MC", "Monaco"], ["MN", "Mongolia"], ["ME", "Montenegro"], ["MS", "Montserrat"], ["MA", "Morocco"], ["MZ", "Mozambique"], ["MM", "Myanmar"], ["NA", "Namibia"], ["NR", "Nauru"], ["NP", "Nepal"], ["NL", "Netherlands"], ["NC", "New Caledonia"], ["NZ", "New Zealand"], ["NI", "Nicaragua"], ["NE", "Niger"], ["NG", "Nigeria"], ["NU", "Niue"], ["NF", "Norfolk Island"], ["MP", "Northern Mariana Islands"], ["NO", "Norway"], ["OM", "Oman"], ["PK", "Pakistan"], ["PW", "Palau"], ["PS", "State of Palestine"], ["PA", "Panama"], ["PG", "Papua New Guinea"], ["PY", "Paraguay"], ["PE", "Peru"], ["PH", "Philippines"], ["PN", "Pitcairn Islands"], ["PL", "Poland"], ["PT", "Portugal"], ["PR", "Puerto Rico"], ["QA", "Qatar"], ["RE", "Réunion"], ["RO", "Romania"], ["RU", "Russia"], ["RW", "Rwanda"], ["BL", "Saint Barthélemy"], ["SH", "Saint Helena, Ascension and Tristan da Cunha"], ["KN", "Saint Kitts and Nevis"], ["LC", "Saint Lucia"], ["MF", "Collectivity of Saint Martin"], ["PM", "Saint Pierre and Miquelon"], ["VC", "Saint Vincent and the Grenadines"], ["WS", "Samoa"], ["SM", "San Marino"], ["ST", "São Tomé and Príncipe"], ["SA", "Saudi Arabia"], ["SN", "Senegal"], ["RS", "Serbia"], ["SC", "Seychelles"], ["SL", "Sierra Leone"], ["SG", "Singapore"], ["SX", "Sint Maarten"], ["SK", "Slovakia"], ["SI", "Slovenia"], ["SB", "Solomon Islands"], ["SO", "Somalia"], ["ZA", "South Africa"], ["GS", "South Georgia and the South Sandwich Islands"], ["SS", "South Sudan"], ["ES", "Spain"], ["LK", "Sri Lanka"], ["SD", "Sudan"], ["SR", "Suriname"], ["SJ", "Svalbard and Jan Mayen"], ["SZ", "Swaziland"], ["SE", "Sweden"], ["CH", "Switzerland"], ["SY", "Syria"], ["TW", "Taiwan, China"], ["TJ", "Tajikistan"], ["TZ", "Tanzania"], ["TH", "Thailand"], ["TL", "East Timor"], ["TG", "Togo"], ["TK", "Tokelau"], ["TO", "Tonga"], ["TT", "Trinidad and Tobago"], ["TN", "Tunisia"], ["TR", "Turkey"], ["TM", "Turkmenistan"], ["TC", "Turks and Caicos Islands"], ["TV", "Tuvalu"], ["UG", "Uganda"], ["UA", "Ukraine"], ["AE", "United Arab Emirates"], ["GB", "United Kingdom"], ["US", "United States"], ["UM", "United States Minor Outlying Islands"], ["UY", "Uruguay"], ["UZ", "Uzbekistan"], ["VU", "Vanuatu"], ["VE", "Venezuela"], ["VN", "Vietnam"], ["VG", "British Virgin Islands"], ["VI", "United States Virgin Islands"], ["WF", "Wallis and Futuna"], ["EH", "Western Sahara"], ["YE", "Yemen"], ["ZM", "Zambia"], ["ZW", "Zimbabwe"]]
@iGEL, as the original author of this issue, would these names work for you?
The other alternative that I can think of, if it's just English common names, would be to take the common
name from https://github.com/mledoze/countries.
@perspectivezoom
What about
From https://github.com/twitter/twitter-cldr-rb
gem install twitter_cldr
require 'twitter_cldr'
TwitterCldr::Shared::Territories.all_for('en')
twitter_cldr
works for my purposes. It looks like the source is http://cldr.unicode.org/index/downloads, which contains https://github.com/unicode-cldr/cldr-localenames-full/blob/master/main/en-US-POSIX/territories.json. Twitter grabs the US-alt-short
"US" instead of the US
"United States". Either is ok with me, but other people might care.
Looking at http://cldr.unicode.org/translation/country-names, this looks like it has some good guidelines for "common" names.
Not sure what the current state is, but the translations in my version (3.0.0) have "United Kingdom" under the translations. And so:
ISO3166::Country['GB'].translations['en'] => 'United Kingdom'
There is also an 'unofficial_names' array which shows 'United Kingdom', though it also includes translated names without any indication of language so probably unsafe to use.
+1 for adding some way of referencing a common name.
+1
Something like this would sure be helpful!
ISO3166::Country['GB'].abbreviated_name # 'United Kingdom'
ISO3166::Country['GB'].short_name # 'United Kingdom'
The issue existed before, but became worse with #414. Some country names are just so ridiculous long that no one would use them in real life (e.g.
"United Kingdom of Great Britain and Northern Ireland"
). While I think it's a good thing to have the ISO names, this gem should also provide the name that 99% of the people use in the daily life. A good hint could be the name of the country's Wikipedia article (in that case: United Kingdom).My suggestion:
Obviously it sucks to change the output of
name
again, but I think, in the longer run this is the most intuitive way.