countries / countries

All sorts of useful information about every country packaged as convenient little country objects. It includes data from ISO 3166 (countries and states/subdivisions ), ISO 4217 (currency), and E.164 (phone numbers).
MIT License
2.17k stars 662 forks source link

Have a iso_name method and return the common name as name #431

Closed iGEL closed 2 years ago

iGEL commented 7 years ago

The issue existed before, but became worse with #414. Some country names are just so ridiculous long that no one would use them in real life (e.g. "United Kingdom of Great Britain and Northern Ireland"). While I think it's a good thing to have the ISO names, this gem should also provide the name that 99% of the people use in the daily life. A good hint could be the name of the country's Wikipedia article (in that case: United Kingdom).

My suggestion:

c = Country["UK"]
c.name # => "United Kingdom"
c.iso_name # => "United Kingdom of Great Britain and Northern Ireland"

Obviously it sucks to change the output of name again, but I think, in the longer run this is the most intuitive way.

rposborne commented 7 years ago

@iGEL I think we have this covered via our translations, which are much much more "common".

rlt commented 7 years ago

Chipping in here, but the effect of this change has been noticeable across our apps. I do not think that saying just use the translations is fair. It will be quite a task for us to go and update everything to pull out the short name out from a translations hash. I think @iGEL's suggestion is perfectly fair and reasonable.

rposborne commented 7 years ago

Hm I see your point but I am more than happy to change the name of name to iso_name but I don't think it's a good idea to add something like name back in as it becomes WHICH name from WHICH perspective. And translations are the CORRECT way this problem should be solved.

That being said, do we need to make the translations API more approachable? I could see a method like name just call out to i18n to get the current locale and load that translation? With a fall back to a default locale of en if we are not inside of rails?

perspectivezoom commented 7 years ago

From a usage perspective, I would benefit from having a single method to get the common country name that users expect.

I can sympathize with not having a definitive source, though. Wikipedia uses "United States Virgin Islands" instead of unofficial_names "Virgin Islands of the United States", but Google Maps uses the more compact "US Virgin Islands" on its display. If it means spinning off another gem to store ISO3166::Country.new('US').additional_names["wikipedia"]["en"], I can accept that. Or if there's already a solution out there, please let me know.

rposborne commented 7 years ago

@perspectivezoom You bring up the fundamental problem which is this gem has to straddle a social-economic problem, and there are only "okay" choices.

ISO is too long. And odd around disputed countries Translations... inconsistent based upon language. Slang... well is slang.

I am 100% for bringing any data into the data set, so no need for another gem. But I am not 100% understanding your reference? are you referencing to the title of one wiki article or some other list.

I personally have spent a lot of time looking at Open Street Map as a data source but it's tagging system as proved way too inconsistent for any structure data retrieval.

rposborne commented 7 years ago

Here is a thought.

name or accepted_name

returns the current locales, Wikipedia title for that country in the respective language This list of the ISO codes, provides a reasonable scrapable data source, while the subtitle of each page is a much more reasonable name.

This echo's @perspectivezoom's thoughts but has a data source. Any other ideas?

perspectivezoom commented 7 years ago

I am not 100% understanding your reference? are you referencing to the title of one wiki article or some other list.

Sorry for not being clear. If I type in the underscored version of "Virgin Islands of the United States" into https://en.wikipedia.org, so (https://en.wikipedia.org/wiki/Virgin_Islands_of_the_United_States), it redirects me to https://en.wikipedia.org/wiki/United_States_Virgin_Islands. Looks like that particular config is accessible via this page: https://en.wikipedia.org/w/index.php?title=Virgin_Islands_of_the_United_States&redirect=no

Ironically, the suggested data source, https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 has the "Country name" text for VI as "Virgin Islands, U.S.", though the link takes you to https://en.wikipedia.org/wiki/United_States_Virgin_Islands. Both formats are acceptable to me, though I might favor the link version, because "Viet Nam".

I too am open to other ideas for a data source.

perspectivezoom commented 7 years ago

I played around with scraping the wikipedia titles. Here's a quick and dirty Nokogiri script I came up with. open-uri's open follows wikipedia's redirects correctly, so even if the link becomes outdated, it should correctly grab the accepted name. It works for a lot of locales, (I'm including arabic ar, english en, italian it, and chinese zh), but not others. German de has their countries in a completely separate article, and french fr has some other links in their table that don't have a good distinguishing heuristic

require 'open-uri'
require 'nokogiri'
require 'countries'
ALL_ALPHA2S = ISO3166::Country.all.map(&:alpha2)
COUNTRY_COUNT = 249

LOCALES = %w(ar en it zh)
data = LOCALES.map do |locale|
  root = "https://#{locale}.wikipedia.org"

  doc = Nokogiri::HTML(open("#{root}/wiki/ISO_3166-1"))
  next unless doc
  content_tables = doc.css("div#bodyContent table.wikitable")
  country_table = content_tables.detect { |t| (t.css("tr").length - 1) == COUNTRY_COUNT }
  next unless country_table
  alpha2_and_article_title_tuples = country_table.css("tr")[1..-1].map do |tr|
    alpha2 = tr.css("td").map(&:text).detect { |text| ALL_ALPHA2S.include? text }
    hopefully_country_path = tr.css("a").map { |a| a['href'] }.detect do |href|
      !href.include?('3166') && !href.include?('Flag')
    end
    country_doc = Nokogiri::HTML(open("#{root}#{hopefully_country_path}"))
    article_title = country_doc.css("h1#firstHeading").text
    [alpha2, article_title]
  end
  alpha2_and_article_title_tuples
end

Output for en. Please verify yourself:

[["AF", "Afghanistan"], ["AX", "Åland Islands"], ["AL", "Albania"], ["DZ", "Algeria"], ["AS", "American Samoa"], ["AD", "Andorra"], ["AO", "Angola"], ["AI", "Anguilla"], ["AQ", "Antarctica"], ["AG", "Antigua and Barbuda"], ["AR", "Argentina"], ["AM", "Armenia"], ["AW", "Aruba"], ["AU", "Australia"], ["AT", "Austria"], ["AZ", "Azerbaijan"], ["BS", "The Bahamas"], ["BH", "Bahrain"], ["BD", "Bangladesh"], ["BB", "Barbados"], ["BY", "Belarus"], ["BE", "Belgium"], ["BZ", "Belize"], ["BJ", "Benin"], ["BM", "Bermuda"], ["BT", "Bhutan"], ["BO", "Bolivia"], ["BQ", "Caribbean Netherlands"], ["BA", "Bosnia and Herzegovina"], ["BW", "Botswana"], ["BV", "Bouvet Island"], ["BR", "Brazil"], ["IO", "British Indian Ocean Territory"], ["BN", "Brunei"], ["BG", "Bulgaria"], ["BF", "Burkina Faso"], ["BI", "Burundi"], ["CV", "Cape Verde"], ["KH", "Cambodia"], ["CM", "Cameroon"], ["CA", "Canada"], ["KY", "Cayman Islands"], ["CF", "Central African Republic"], ["TD", "Chad"], ["CL", "Chile"], ["CN", "China"], ["CX", "Christmas Island"], ["CC", "Cocos (Keeling) Islands"], ["CO", "Colombia"], ["KM", "Comoros"], ["CG", "Republic of the Congo"], ["CD", "Democratic Republic of the Congo"], ["CK", "Cook Islands"], ["CR", "Costa Rica"], ["CI", "Ivory Coast"], ["HR", "Croatia"], ["CU", "Cuba"], ["CW", "Curaçao"], ["CY", "Cyprus"], ["CZ", "Czech Republic"], ["DK", "Denmark"], ["DJ", "Djibouti"], ["DM", "Dominica"], ["DO", "Dominican Republic"], ["EC", "Ecuador"], ["EG", "Egypt"], ["SV", "El Salvador"], ["GQ", "Equatorial Guinea"], ["ER", "Eritrea"], ["EE", "Estonia"], ["ET", "Ethiopia"], ["FK", "Falkland Islands"], ["FO", "Faroe Islands"], ["FJ", "Fiji"], ["FI", "Finland"], ["FR", "France"], ["GF", "French Guiana"], ["PF", "French Polynesia"], ["TF", "French Southern and Antarctic Lands"], ["GA", "Gabon"], ["GM", "The Gambia"], ["GE", "Georgia (country)"], ["DE", "Germany"], ["GH", "Ghana"], ["GI", "Gibraltar"], ["GR", "Greece"], ["GL", "Greenland"], ["GD", "Grenada"], ["GP", "Guadeloupe"], ["GU", "Guam"], ["GT", "Guatemala"], ["GG", "Guernsey"], ["GN", "Guinea"], ["GW", "Guinea-Bissau"], ["GY", "Guyana"], ["HT", "Haiti"], ["HM", "Heard Island and McDonald Islands"], ["VA", "Vatican City"], ["HN", "Honduras"], ["HK", "Hong Kong"], ["HU", "Hungary"], ["IS", "Iceland"], ["IN", "India"], ["ID", "Indonesia"], ["IR", "Iran"], ["IQ", "Iraq"], ["IE", "Republic of Ireland"], ["IM", "Isle of Man"], ["IL", "Israel"], ["IT", "Italy"], ["JM", "Jamaica"], ["JP", "Japan"], ["JE", "Jersey"], ["JO", "Jordan"], ["KZ", "Kazakhstan"], ["KE", "Kenya"], ["KI", "Kiribati"], ["KP", "North Korea"], ["KR", "South Korea"], ["KW", "Kuwait"], ["KG", "Kyrgyzstan"], ["LA", "Laos"], ["LV", "Latvia"], ["LB", "Lebanon"], ["LS", "Lesotho"], ["LR", "Liberia"], ["LY", "Libya"], ["LI", "Liechtenstein"], ["LT", "Lithuania"], ["LU", "Luxembourg"], ["MO", "Macau"], ["MK", "Republic of Macedonia"], ["MG", "Madagascar"], ["MW", "Malawi"], ["MY", "Malaysia"], ["MV", "Maldives"], ["ML", "Mali"], ["MT", "Malta"], ["MH", "Marshall Islands"], ["MQ", "Martinique"], ["MR", "Mauritania"], ["MU", "Mauritius"], ["YT", "Mayotte"], ["MX", "Mexico"], ["FM", "Federated States of Micronesia"], ["MD", "Moldova"], ["MC", "Monaco"], ["MN", "Mongolia"], ["ME", "Montenegro"], ["MS", "Montserrat"], ["MA", "Morocco"], ["MZ", "Mozambique"], ["MM", "Myanmar"], ["NA", "Namibia"], ["NR", "Nauru"], ["NP", "Nepal"], ["NL", "Netherlands"], ["NC", "New Caledonia"], ["NZ", "New Zealand"], ["NI", "Nicaragua"], ["NE", "Niger"], ["NG", "Nigeria"], ["NU", "Niue"], ["NF", "Norfolk Island"], ["MP", "Northern Mariana Islands"], ["NO", "Norway"], ["OM", "Oman"], ["PK", "Pakistan"], ["PW", "Palau"], ["PS", "State of Palestine"], ["PA", "Panama"], ["PG", "Papua New Guinea"], ["PY", "Paraguay"], ["PE", "Peru"], ["PH", "Philippines"], ["PN", "Pitcairn Islands"], ["PL", "Poland"], ["PT", "Portugal"], ["PR", "Puerto Rico"], ["QA", "Qatar"], ["RE", "Réunion"], ["RO", "Romania"], ["RU", "Russia"], ["RW", "Rwanda"], ["BL", "Saint Barthélemy"], ["SH", "Saint Helena, Ascension and Tristan da Cunha"], ["KN", "Saint Kitts and Nevis"], ["LC", "Saint Lucia"], ["MF", "Collectivity of Saint Martin"], ["PM", "Saint Pierre and Miquelon"], ["VC", "Saint Vincent and the Grenadines"], ["WS", "Samoa"], ["SM", "San Marino"], ["ST", "São Tomé and Príncipe"], ["SA", "Saudi Arabia"], ["SN", "Senegal"], ["RS", "Serbia"], ["SC", "Seychelles"], ["SL", "Sierra Leone"], ["SG", "Singapore"], ["SX", "Sint Maarten"], ["SK", "Slovakia"], ["SI", "Slovenia"], ["SB", "Solomon Islands"], ["SO", "Somalia"], ["ZA", "South Africa"], ["GS", "South Georgia and the South Sandwich Islands"], ["SS", "South Sudan"], ["ES", "Spain"], ["LK", "Sri Lanka"], ["SD", "Sudan"], ["SR", "Suriname"], ["SJ", "Svalbard and Jan Mayen"], ["SZ", "Swaziland"], ["SE", "Sweden"], ["CH", "Switzerland"], ["SY", "Syria"], ["TW", "Taiwan, China"], ["TJ", "Tajikistan"], ["TZ", "Tanzania"], ["TH", "Thailand"], ["TL", "East Timor"], ["TG", "Togo"], ["TK", "Tokelau"], ["TO", "Tonga"], ["TT", "Trinidad and Tobago"], ["TN", "Tunisia"], ["TR", "Turkey"], ["TM", "Turkmenistan"], ["TC", "Turks and Caicos Islands"], ["TV", "Tuvalu"], ["UG", "Uganda"], ["UA", "Ukraine"], ["AE", "United Arab Emirates"], ["GB", "United Kingdom"], ["US", "United States"], ["UM", "United States Minor Outlying Islands"], ["UY", "Uruguay"], ["UZ", "Uzbekistan"], ["VU", "Vanuatu"], ["VE", "Venezuela"], ["VN", "Vietnam"], ["VG", "British Virgin Islands"], ["VI", "United States Virgin Islands"], ["WF", "Wallis and Futuna"], ["EH", "Western Sahara"], ["YE", "Yemen"], ["ZM", "Zambia"], ["ZW", "Zimbabwe"]]

@iGEL, as the original author of this issue, would these names work for you?

The other alternative that I can think of, if it's just English common names, would be to take the common name from https://github.com/mledoze/countries.

rposborne commented 7 years ago

@perspectivezoom

What about

From https://github.com/twitter/twitter-cldr-rb gem install twitter_cldr

require 'twitter_cldr'
TwitterCldr::Shared::Territories.all_for('en')
perspectivezoom commented 7 years ago

twitter_cldr works for my purposes. It looks like the source is http://cldr.unicode.org/index/downloads, which contains https://github.com/unicode-cldr/cldr-localenames-full/blob/master/main/en-US-POSIX/territories.json. Twitter grabs the US-alt-short "US" instead of the US "United States". Either is ok with me, but other people might care.

Looking at http://cldr.unicode.org/translation/country-names, this looks like it has some good guidelines for "common" names.

damien-roche commented 4 years ago

Not sure what the current state is, but the translations in my version (3.0.0) have "United Kingdom" under the translations. And so:

ISO3166::Country['GB'].translations['en'] => 'United Kingdom'

There is also an 'unofficial_names' array which shows 'United Kingdom', though it also includes translated names without any indication of language so probably unsafe to use.

+1 for adding some way of referencing a common name.

tomrossi7 commented 3 years ago

+1

Something like this would sure be helpful!

ISO3166::Country['GB'].abbreviated_name   # 'United Kingdom'
ISO3166::Country['GB'].short_name         # 'United Kingdom'
pmor commented 2 years ago

717 has been merged and released