Open AetherPrior opened 1 year ago
some of names you can find in your link Wikipedia's list. Some may find in sublink ISO 639 macrolanguage
But using Google
to search code bpy
I found page https://iso639-3.sil.org/code_tables/639/data and it may have all names (but I didn't test all of codes from your list)
At the bottom of this page I found also link to download tables and it seems it has tables as iso-639-3.tab (tab seperated values
similar to .csv
)
You could use app-subtags. Go to this site, paste the entire list in the Look up field and it returns the identified languages. Out of all the tokens, only eml is not identified. Maybe fasttext uses a slightly different version of BCP47 instead of ISO-639.
Alternatively, if you wanna do something similar in python, you could use langcodes:
# pip install langcodes language_data
from langcodes import tag_is_valid, Language
lang_tag = "af"
if tag_is_valid(lang_tag):
lang_name = Language.get(lang_tag).display_name("en")
print(f"Language name in english: {lang_name}") # Language name in english: Afrikaans
It's based on the List_of_Wikipedias: https://en.wikipedia.org/wiki/List_of_Wikipedias
for example the als
code in the fasttext refers to gsw
code in the ISO-639.
or ku
doesnot refers to kur
but refers to kmr
It's based on the List_of_Wikipedias: en.wikipedia.org/wiki/List_of_Wikipedias
for example the
als
code in the fasttext refers togsw
code in the ISO-639. orku
doesnot refers tokur
but refers tokmr
Great comment! I have created a function to normalize the output identify language (aka WP code) to ISO 639-3 Id, and the languages
json data generated by a set of languages scripts.
I am trying to find out the names of languages supported by Fasttext's LID tool, given these language codes listed here:
I tried to map the ISO codes to each language, but it seems non-standard, either using ISO-639-1 or ISO-639-3. Does anyone have a list of language names for these codes, or know how to find them?
Wikipedia's list does not cover all of them either, so manual mapping too did not help.
Thanks!