geolexica / geolexica-server

Generalized backend for Geolexica sites
2 stars 1 forks source link

Use Ruby i18n gem for mapping language codes #53

Open skalee opened 4 years ago

skalee commented 4 years ago

Ruby i18n gem is capable of mapping from ISO 639-2 to BCP47 language codes. We should use it rather than maintaining code mappings ourselves.

ronaldtse commented 4 years ago

Great, let's do that.

skalee commented 4 years ago

My bad, I think it is not. However, there is i18n_data gem and perhaps others which may help. Investigation needed.

skalee commented 4 years ago

The Library of Congress has been designated the ISO 639-2/RA for the purpose of processing requests for alpha-3 language codes comprising the International Standard, Codes for the representation of names of languages-- Part 2: alpha-3 code ---- http://www.loc.gov/standards/iso639-2/

LOC maintains a structured text file which we can use. It can be downloaded from http://www.loc.gov/standards/iso639-2/ascii_8bits.html.

skalee commented 4 years ago

I found a perfect gem for this task: https://github.com/scsmith/language_list

Unfortunately, the author doesn't mention a source. But given the number of downloads (about 1 million) I guess this gem can be trusted. And I suppose I can feed this gem with official data from https://iso639-3.sil.org/ which is responsible for maintaining the most comprehensive registry (according to https://www.iso.org/maintenance_agencies.html).

ronaldtse commented 4 years ago

@skalee It's list of languages come from ISO 639-1, -2 and -3: https://raw.githubusercontent.com/scsmith/language_list/master/data/languages.yml

Since the gem hasn't been updated for a while I assume it might be out of date regarding the 639-3 codes since they change.

I still think we are happy enough with the iso639-2 codes we already have a YAML of.

skalee commented 4 years ago

@ronaldtse I'd love to have all language codes in one place. Either in a gem or at least some easy to use centralized registry. We run into issues with language codes bit too often. If we had some central registry, we could write some tests which ensure that we use correct codes rather than rely on our perception.

Actually I don't care much about ISO 639-3. This gem provides ISO 639-2 (T) as well.

And not the most urgent thing probably, but language code bugs are recurring.