Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
697 stars 132 forks source link

check lang inconsistences, especially cor, aze, khm, lao #109

Closed alanfgh closed 10 years ago

alanfgh commented 10 years ago

The strings for cor (Cornish), aze (Azerbaijani), khm (Khmer), and lao (Lao) were missing altogether from docs/generate_sphinx_conf.php on the server. Also, the last 13 items in that file were lacking English names for the languages.

I fixed these problems in the versions that I checked into the repository, but in our current workflow, those changes won't make it to the server. Someone with write access to the server would need to copy the fixed array from a dev repository (or GitHub) to the server copy.

It seems like it would be a good idea to alphabetize the arrays in the files touched by docs/add_lang.sh to make it easier to check for consistency across the different files. We also should make sure they all have icon (flag) files.

alanfgh commented 10 years ago

For reference, here is the language array from docs/generate_sphinx_conf.php sorted by ISO code:

'acm' => 'Iraqi Arabic',
'afr' => 'Afrikaans',
'ain' => 'Ainu',
'ang' => 'Old English',
'ara' => 'Arabic',
'arq' => 'Algerian Arabic' ,
'arz' => 'Egyptian Arabic',
'ast' => 'Asturian',
'avk' => 'Kotava',
'aze' => 'Azerbaijani' ,
'bel' => 'Belarusian',
'ben' => 'Bengali',
'ber' => 'Berber',
'bod' => 'Standard Tibetan' ,
'bos' => 'Bosnian',
'bre' => 'Breton',
'bul' => 'Bulgarian',
'cat' => 'Catalan',
'ces' => 'Czech',
'cha' => 'Chamorro',
'ckt' => 'Chukchi' ,
'cmn' => 'Chinese',
'cor' => 'Cornish' ,
'cycl' => 'CycL',
'cym' => 'Welsh',
'dan' => 'Danish',
'deu' => 'German',
'dsb' => 'Lower Sorbian',
'ell' => 'Greek',
'eng' => 'English',
'epo' => 'Esperanto',
'est' => 'Estonian',
'eus' => 'Basque',
'ewe' => 'Ewe',
'fao' => 'Faroese',
'fin' => 'Finnish',
'fra' => 'French',
'fry' => 'Frisian',
'gla' => 'Scottish Gaelic',
'gle' => 'Irish',
'glg' => 'Galician',
'grc' => 'Ancient Greek' , //@lang
'grn' => 'Guarani',
'heb' => 'Hebrew',
'hil' => 'Hiligaynon' ,
'hin' => 'Hindi',
'hrv' => 'Croatian',
'hsb' => 'Upper Sorbian',
'hun' => 'Hungarian',
'hye' => 'Armenian',
'ido' => 'Ido',
'ile' => 'Interlingue',
'ina' => 'Interlingua',
'ind' => 'Indonesian',
'isl' => 'Icelandic',
'ita' => 'Italian',
'jbo' => 'Lojban',
'jpn' => 'Japanese',
'kat' => 'Georgian',
'kaz' => 'Kazakh',
'khm' => 'Khmer' ,
'kor' => 'Korean',
'ksh' => 'Kölsch',
'kur' => 'Kurdish',
'lad' => 'Ladino',
'lao' => 'Lao' ,
'lat' => 'Latin',
'lit' => 'Lithuanian',
'lld' => 'Ladin',
'lvs' => 'Latvian',
'lzh' => 'Literary Chinese',
'mal' => 'Malayalam',
'mar' => 'Marathi',
'mlg' => 'Malagasy',
'mlt' => 'Maltese' ,
'mon' => 'Mongolian',
'mri' => 'Maori',
'nan' => 'Teochew',
'nds' => 'Low Saxon',
'nld' => 'Dutch',
'nob' => 'Norwegian (Bokmål)',
'non' => 'Norwegian (Nynorsk)',
'nov' => 'Novial',
'npi' => 'Nepali' ,
'oci' => 'Occitan',
'orv' => 'Old East Slavic',
'oss' => 'Ossetian',
'pcd' => 'Picard' ,
'pes' => 'Persian',
'pms' => 'Piemontese',
'pnb' => 'Punjabi',
'pol' => 'Polish',
'por' => 'Portuguese',
'prg' => 'Old Prussian' ,
'que' => 'Quechua',
'qya' => 'Quenya',
'roh' => 'Romansh',
'ron' => 'Romanian',
'rus' => 'Russian',
'san' => 'Sanskrit',
'scn' => 'Sicilian',
'sjn' => 'Sindarin',
'slk' => 'Slovak',
'slv' => 'Slovenian',
'spa' => 'Spanish',
'sqi' => 'Albanian',
'srp' => 'Serbian',
'swe' => 'Swedish',
'swh' => 'Swahili',
'tat' => 'Tatar',
'tel' => 'Telugu',
'tgk' => 'Tajik',
'tgl' => 'Tagalog',
'tha' => 'Thai',
'tlh' => 'Klingon',
'toki' => 'Toki Pona',
'tpi' => 'Tok Pisin',
'tpw' => 'Old Tupi',
'tur' => 'Turkish',
'uig' => 'Uyghur',
'ukr' => 'Ukrainian',
'urd' => 'Urdu',
'uzb' => 'Uzbek',
'vie' => 'Vietnamese',
'vol' => 'Volapük',
'wuu' => 'Shanghainese',
'xal' => 'Kalmyk',
'xho' => 'Xhosa',
'yid' => 'Yiddish',
'yue' => 'Cantonese',
'zsm' => 'Malay',
alanfgh commented 10 years ago

Also for reference, here are the items sorted by English name:

'afr' => 'Afrikaans',
'ain' => 'Ainu',
'sqi' => 'Albanian',
'arq' => 'Algerian Arabic' ,
'grc' => 'Ancient Greek' , //@lang
'ara' => 'Arabic',
'hye' => 'Armenian',
'ast' => 'Asturian',
'aze' => 'Azerbaijani' ,
'eus' => 'Basque',
'bel' => 'Belarusian',
'ben' => 'Bengali',
'ber' => 'Berber',
'bos' => 'Bosnian',
'bre' => 'Breton',
'bul' => 'Bulgarian',
'yue' => 'Cantonese',
'cat' => 'Catalan',
'cha' => 'Chamorro',
'cmn' => 'Chinese',
'ckt' => 'Chukchi' ,
'cor' => 'Cornish' ,
'hrv' => 'Croatian',
'cycl' => 'CycL',
'ces' => 'Czech',
'dan' => 'Danish',
'nld' => 'Dutch',
'arz' => 'Egyptian Arabic',
'eng' => 'English',
'epo' => 'Esperanto',
'est' => 'Estonian',
'ewe' => 'Ewe',
'fao' => 'Faroese',
'fin' => 'Finnish',
'fra' => 'French',
'fry' => 'Frisian',
'glg' => 'Galician',
'kat' => 'Georgian',
'deu' => 'German',
'ell' => 'Greek',
'grn' => 'Guarani',
'heb' => 'Hebrew',
'hil' => 'Hiligaynon' ,
'hin' => 'Hindi',
'hun' => 'Hungarian',
'isl' => 'Icelandic',
'ido' => 'Ido',
'ind' => 'Indonesian',
'ina' => 'Interlingua',
'ile' => 'Interlingue',
'acm' => 'Iraqi Arabic',
'gle' => 'Irish',
'ita' => 'Italian',
'jpn' => 'Japanese',
'ksh' => 'Kölsch',
'xal' => 'Kalmyk',
'kaz' => 'Kazakh',
'khm' => 'Khmer' ,
'tlh' => 'Klingon',
'kor' => 'Korean',
'avk' => 'Kotava',
'kur' => 'Kurdish',
'lld' => 'Ladin',
'lad' => 'Ladino',
'lao' => 'Lao' ,
'lat' => 'Latin',
'lvs' => 'Latvian',
'lzh' => 'Literary Chinese',
'lit' => 'Lithuanian',
'jbo' => 'Lojban',
'nds' => 'Low Saxon',
'dsb' => 'Lower Sorbian',
'mlg' => 'Malagasy',
'zsm' => 'Malay',
'mal' => 'Malayalam',
'mlt' => 'Maltese' ,
'mri' => 'Maori',
'mar' => 'Marathi',
'mon' => 'Mongolian',
'npi' => 'Nepali' ,
'nob' => 'Norwegian (Bokmål)',
'non' => 'Norwegian (Nynorsk)',
'nov' => 'Novial',
'oci' => 'Occitan',
'orv' => 'Old East Slavic',
'ang' => 'Old English',
'prg' => 'Old Prussian' ,
'tpw' => 'Old Tupi',
'oss' => 'Ossetian',
'pes' => 'Persian',
'pcd' => 'Picard' ,
'pms' => 'Piemontese',
'pol' => 'Polish',
'por' => 'Portuguese',
'pnb' => 'Punjabi',
'que' => 'Quechua',
'qya' => 'Quenya',
'ron' => 'Romanian',
'roh' => 'Romansh',
'rus' => 'Russian',
'san' => 'Sanskrit',
'gla' => 'Scottish Gaelic',
'srp' => 'Serbian',
'wuu' => 'Shanghainese',
'scn' => 'Sicilian',
'sjn' => 'Sindarin',
'slk' => 'Slovak',
'slv' => 'Slovenian',
'spa' => 'Spanish',
'bod' => 'Standard Tibetan' ,
'swh' => 'Swahili',
'swe' => 'Swedish',
'tgl' => 'Tagalog',
'tgk' => 'Tajik',
'tat' => 'Tatar',
'tel' => 'Telugu',
'nan' => 'Teochew',
'tha' => 'Thai',
'tpi' => 'Tok Pisin',
'toki' => 'Toki Pona',
'tur' => 'Turkish',
'ukr' => 'Ukrainian',
'hsb' => 'Upper Sorbian',
'urd' => 'Urdu',
'uig' => 'Uyghur',
'uzb' => 'Uzbek',
'vie' => 'Vietnamese',
'vol' => 'Volapük',
'cym' => 'Welsh',
'xho' => 'Xhosa',
'yid' => 'Yiddish',
alanfgh commented 10 years ago

The other source files in question are:

app/models/sentence.php app/views/helpers/languages.php

alanfgh commented 10 years ago

There is one misspelling in app/models/sentence.php: the code for Malagasy is given as "mgl" instead of "mlg". Otherwise, the lists of languages match up.