commons-app / apps-android-commons

The Wikimedia Commons Android app allows users to upload pictures from their Android phone/tablet to Wikimedia Commons
https://commons-app.github.io/
Apache License 2.0
1.03k stars 1.23k forks source link

List of languages for captions does not seem to match server-side languages (hard to choose language with massive list) #3681

Closed nicolas-raoul closed 3 years ago

nicolas-raoul commented 4 years ago

I am all for language diversity, but it seems that the list of languages we propose for captions does not match what the server side actually can store:

Screen Shot 2020-04-20 at 19 33 22

Screenshot_20200420-193302_Commons

Please note that the list must be localized (language names appear in Japanese if your phone's locale is Japanese), which works great currently. We should find where the Commons web UI gets these lists from.

macgills commented 4 years ago
    init {
        val sortedLanguages = Locale.getAvailableLocales()
            .map(::Language)
            .sortedBy { it.locale.displayName }
        languageNamesList = sortedLanguages.map { it.locale.displayName }
        languageCodesList = sortedLanguages.map { it.locale.language }
    }

We generate this list by reading the locales on device, just to note

macgills commented 4 years ago

@nicolas-raoul it seems like

/** Immutable look up table for all app supported languages. All article languages may not be
  * present in this table as it is statically bundled with the app. */
public class AppLanguageLookUpTable {

should probably be what is used, similar to what is used in AboutActivity

Saral-code commented 3 years ago

I would like to work on this. Can you assign me this issue?

sivaraam commented 3 years ago

@Saral-code sure. You can work on this. 🙂

Saral-code commented 3 years ago

Hello Everyone, Here are the few things I observed:

  1. Answering @nicolas-raoul , this list coming from the JRE (Java Runtime Environment) on which the app is running and I don't think that list can be altered.
  2. Almost every JRE has same number of languages, so changing JDKs or JREs won't help.

What I think we can do is as follows :

  1. Manually creating a list of all languages, which I think is not feasible.
  2. PREFERRED IDEA : We can observe different language has different code (e.g. [en] for English, [fr] for French) and this list is getting this massive due to multiple variants of same languages like Italian (San Marino) and Italian (Switzerland). So what we can do is keep only one option for one language code.
  3. No two language have same code, so no chances of discrepancy remains.

Tell me whichever way it'll be better or any other way possible. Thanks

nicolas-raoul commented 3 years ago

@Saral-code Thanks for listing your observations, and for listing different strategies while evaluating them, very good methodology.

Actually, I think that the best way would be a 4th way of getting languages, which is hinted at https://github.com/commons-app/apps-android-commons/issues/3681#issuecomment-617107276. AppLanguageLookUpTable.java indeed contains various methods that look like they could give us a list of languages very similar to the first screenshot above. Would you mind testing getCodes() and getCanonicalNames() and getLocalizedNames() of AppLanguageLookUpTable.java and pasting their output (list of languages) here?

(bonus: once you are done with this pull request, a second pull request to add javadoc documentation to AppLanguageLookUpTable.java would be very welcome ☺)

Saral-code commented 3 years ago

Thanks for the compliment @nicolas-raoul , this methodology helps in letting another person understand properly how am I thinking. Firstly, As asked by @nicolas-raoul , here are the outputs- (These outputs were coming from \apps\data-client\src\main\res\values\languages_list.xml)

**Output for getCodes()** 
en, es, de, ja, fr, ru, pt, it, zh-hans, zh-hant, ar, ko, id, pl, nl, fa, hi, th, vi, sv, uk, cs, simple, hu, ro, fi, el, he, nb, da, sr, hr, ms, bg, ca, tr, sk, sh, bn, tl, mr, ta, kk, lt, az, bs, sl, sq, arz, zh-yue, ka, te, et, lv, ml, hy, uz, kn, af, nn, mk, gl, sw, eu, ur, ky, gu, bh, sco, ast, is, mn, be, an, km, si, ceb, jv, eo, als, ig, su, be-x-old, la, my, cy, ne, bar, azb, mzn, as, am, so, pa, map-bms, scn, tg, ckb, ga, lb, war, zh-min-nan, nds, fy, vec, pnb, zh-classical, lmo, tt, io, ia, br, hif, mg, wuu, gan, ang, or, oc, yi, ps, tk, ba, sah, fo, nap, vls, sa, ce, qu, ku, min, bcl, ilo, ht, li, wa, vo, nds-nl, pam, new, mai, sn, pms, eml, yo, ha, gn, frr, gd, hsb, cv, lo, os, se, cdo, sd, ksh, bat-smg, bo, nah, xmf, ace, roa-tara, hak, bjn, gv, mt, pfl, szl, bpy, rue, co, diq, sc, rw, vep, lij, kw, fur, pcd, lad, tpi, ext, csb, rm, kab, gom, udm, mhr, glk, za, pdc, om, iu, nv, mi, nrm, tcy, frp, myv, kbp, dsb, zu, ln, mwl, fiu-vro, tum, tet, tn, pnt, stq, nov, ny, xh, crh, lfn, st, pap, ay, zea, bxr, kl, sm, ak, ve, pag, nso, kaa, lez, gag, kv, bm, to, lbe, krc, jam, ss, roa-rup, dv, ie, av, cbk-zam, chy, inh, ug, ch, arc, pih, mrj, kg, rmy, dty, na, ts, xal, wo, fj, tyv, olo, ltg, ff, jbo, haw, ki, chr, sg, atj, sat, ady, ty, lrc, ti, din, gor, lg, rn, bi, cu, kbd, pi, cr, koi, ik, mdf, bug, ee, shn, tw, dz, srn, ks, test, en-x-piglatin, ab

**Output for getCanonicalNames()**
English, Spanish, German, Japanese, French, Russian, Portuguese, Italian, Simplified Chinese, Traditional Chinese, Arabic, Korean, Indonesian, Polish, Dutch, Persian, Hindi, Thai, Vietnamese, Swedish, Ukrainian, Czech, Simple English, Hungarian, Romanian, Finnish, Greek, Hebrew, Norwegian, Danish, Serbian, Croatian, Malay, Bulgarian, Catalan, Turkish, Slovak, Serbo-Croatian, Bangla, Tagalog, Marathi, Tamil, Kazakh, Lithuanian, Azerbaijani, Bosnian, Slovenian, Albanian, Egyptian Arabic, Cantonese, Georgian, Telugu, Estonian, Latvian, Malayalam, Armenian, Uzbek, Kannada, Afrikaans, Norwegian Nynorsk, Macedonian, Galician, Swahili, Basque, Urdu, Kyrgyz, Gujarati, Bhojpuri, Scots, Asturian, Icelandic, Mongolian, Belarusian, Aragonese, Khmer, Sinhala, Cebuano, Javanese, Esperanto, Alemannisch, Igbo, Sundanese, Belarusian (Taraškievica orthography), Latin, Burmese, Welsh, Nepali, Bavarian, South Azerbaijani, Mazanderani, Assamese, Amharic, Somali, Punjabi, Basa Banyumasan, Sicilian, Tajik, Central Kurdish, Irish, Luxembourgish, Waray, Chinese (Min Nan), Low German, Western Frisian, Venetian, Western Punjabi, Classical Chinese, Lombard, Tatar, Ido, Interlingua, Breton, Fiji Hindi, Malagasy, Wu Chinese, Gan Chinese, Old English, Odia, Occitan, Yiddish, Pashto, Turkmen, Bashkir, Sakha, Faroese, Neapolitan, West Flemish, Sanskrit, Chechen, Quechua, Kurdish, Minangkabau, Central Bikol, Iloko, Haitian Creole, Limburgish, Walloon, Volapük, Low Saxon, Pampanga, Newari, Maithili, Shona, Piedmontese, Emiliano-Romagnolo, Yoruba, Hausa, Guarani, Northern Frisian, Scottish Gaelic, Upper Sorbian, Chuvash, Lao, Ossetic, Northern Sami, Min Dong Chinese, Sindhi, Colognian, Samogitian, Tibetan, Nāhuatl, Mingrelian, Achinese, Tarantino, Hakka Chinese, Banjar, Manx, Maltese, Palatine German, Silesian, Bishnupriya, Rusyn, Corsican, Zazaki, Sardinian, Kinyarwanda, Veps, Ligurian, Cornish, Friulian, Picard, Ladino, Tok Pisin, Extremaduran, Kashubian, Romansh, Kabyle, Goan Konkani, Udmurt, Eastern Mari, Gilaki, Zhuang, Pennsylvania German, Oromo, Inuktitut, Navajo, Maori, Norman, Tulu, Arpitan, Erzya, Kabiye, Lower Sorbian, Zulu, Lingala, Mirandese, Võro, Tumbuka, Tetum, Tswana, Pontic, Saterland Frisian, Novial, Nyanja, Xhosa, Crimean Turkish, Lingua Franca Nova, Southern Sotho, Papiamento, Aymara, Zeelandic, Russia Buriat, Kalaallisut, Samoan, Akan, Venda, Pangasinan, Northern Sotho, Kara-Kalpak, Lezghian, Gagauz, Komi, Bambara, Tongan, Lak, Karachay-Balkar, Jamaican Creole English, Swati, Aromanian, Divehi, Interlingue, Avaric, Chavacano, Cheyenne, Ingush, Uyghur, Chamorro, Aramaic, Norfuk / Pitkern, Western Mari, Kongo, Romani, Doteli, Nauru, Tsonga, Kalmyk, Wolof, Fijian, Tuvinian, Livvi-Karelian, Latgalian, Fulah, Lojban, Hawaiian, Kikuyu, Cherokee, Sango, Atikamekw, Santali, Adyghe, Tahitian, Northern Luri, Tigrinya, Dinka, Gorontalo, Ganda, Rundi, Bislama, Church Slavic, Kabardian, Pali, Cree, Komi-Permyak, Inupiaq, Moksha, Buginese, Ewe, Shan, Twi, Dzongkha, Sranan Tongo, Kashmiri, Test, Pig Latin, Abkhazian

**Output for getLocalizedNames()**
English, español, Deutsch, 日本語, français, русский, português, italiano, 简体中文, 繁體中文, العربية, 한국어, Bahasa Indonesia, polski, Nederlands, فارسی, हिन्दी, ไทย, Tiếng Việt, svenska, українська, čeština, Simple English, magyar, română, suomi, Ελληνικά, עברית, norsk, dansk, српски / srpski, hrvatski, Bahasa Melayu, български, català, Türkçe, slovenčina, srpskohrvatski / српскохрватски, বাংলা, Tagalog, मराठी, தமிழ், қазақша, lietuvių, azərbaycanca, bosanski, slovenščina, shqip, مصرى, 粵語, ქართული, తెలుగు, eesti, latviešu, മലയാളം, հայերեն, oʻzbekcha/ўзбекча, ಕನ್ನಡ, Afrikaans, norsk nynorsk, македонски, galego, Kiswahili, euskara, اردو, Кыргызча, ગુજરાતી, भोजपुरी, Scots, asturianu, íslenska, монгол, беларуская, aragonés, ភាសាខ្មែរ, සිංහල, Cebuano, Basa Jawa, Esperanto, Alemannisch, Igbo, Basa Sunda, беларуская (тарашкевіца)‎, Latina, မြန်မာဘာသာ, Cymraeg, नेपाली, Boarisch, تۆرکجه, مازِرونی, অসমীয়া, አማርኛ, Soomaaliga, ਪੰਜਾਬੀ, Basa Banyumasan, sicilianu, тоҷикӣ, کوردی, Gaeilge, Lëtzebuergesch, Winaray, Bân-lâm-gú, Plattdüütsch, Frysk, vèneto, پنجابی, 文言, lumbaart, татарча/tatarça, Ido, interlingua, brezhoneg, Fiji Hindi, Malagasy, 吴语, 贛語, Ænglisc, ଓଡ଼ିଆ, occitan, ייִדיש, پښتو, Türkmençe, башҡортса, саха тыла, føroyskt, Napulitano, West-Vlams, संस्कृतम्, нохчийн, Runa Simi, kurdî, Baso Minangkabau, Bikol Central, Ilokano, Kreyòl ayisyen, Limburgs, walon, Volapük, Nedersaksies, Kapampangan, नेपाल भाषा, मैथिली, chiShona, Piemontèis, emiliàn e rumagnòl, Yorùbá, Hausa, Avañe'ẽ, Nordfriisk, Gàidhlig, hornjoserbsce, Чӑвашла, ລາວ, Ирон, davvisámegiella, Mìng-dĕ̤ng-ngṳ̄, سنڌي, Ripoarisch, žemaitėška, བོད་ཡིག, Nāhuatl, მარგალური, Acèh, tarandíne, 客家語/Hak-kâ-ngî, Bahasa Banjar, Gaelg, Malti, Pälzisch, ślůnski, বিষ্ণুপ্রিয়া মণিপুরী, русиньскый, corsu, Zazaki, sardu, Kinyarwanda, vepsän kel’, Ligure, kernowek, furlan, Picard, Ladino, Tok Pisin, estremeñu, kaszëbsczi, rumantsch, Taqbaylit, गोंयची कोंकणी / Gõychi Konknni, удмурт, олык марий, گیلکی, Vahcuengh, Deitsch, Oromoo, ᐃᓄᒃᑎᑐᑦ/inuktitut, Diné bizaad, Māori, Nouormand, ತುಳು, arpetan, эрзянь, Kabɩyɛ, dolnoserbski, isiZulu, lingála, Mirandés, Võro, chiTumbuka, tetun, Setswana, Ποντιακά, Seeltersk, Novial, Chi-Chewa, isiXhosa, qırımtatarca, Lingua Franca Nova, Sesotho, Papiamentu, Aymar aru, Zeêuws, буряад, kalaallisut, Gagana Samoa, Akan, Tshivenda, Pangasinan, Sesotho sa Leboa, Qaraqalpaqsha, лезги, Gagauz, коми, bamanankan, lea faka-Tonga, лакку, къарачай-малкъар, Patois, SiSwati, armãneashti, ދިވެހިބަސް, Interlingue, авар, Chavacano de Zamboanga, Tsetsêhestâhese, ГӀалгӀай, ئۇيغۇرچە / Uyghurche, Chamoru, ܐܪܡܝܐ, Norfuk / Pitkern, кырык мары, Kongo, Romani, डोटेली, Dorerin Naoero, Xitsonga, хальмг, Wolof, Na Vosa Vakaviti, тыва дыл, Livvinkarjala, latgaļu, Fulfulde, la .lojban., Hawaiʻi, Gĩkũyũ, ᏣᎳᎩ, Sängö, Atikamekw, ᱥᱟᱱᱛᱟᱲᱤ, адыгабзэ, reo tahiti, لۊری شومالی, ትግርኛ, Thuɔŋjäŋ, Bahasa Hulontalo, Luganda, Kirundi, Bislama, словѣньскъ / ⰔⰎⰑⰂⰡⰐⰠⰔⰍⰟ, Адыгэбзэ, पालि, Nēhiyawēwin / ᓀᐦᐃᔭᐍᐏᐣ, Перем Коми, Iñupiak, мокшень, ᨅᨔ ᨕᨘᨁᨗ, eʋegbe, ၽႃႇသႃႇတႆး, Twi, ཇོང་ཁ, Sranantongo, कॉशुर / کٲشُر, Test, Igpay Atinlay, Аҧсшәа

<<all above outputs are converted into string format, actual output was List data type>>

SOLVING INTUTIONS AND FEW OBSERVATIONS :

  1. As you can see that this output list is not that massive as shown in the app.

  2. I would like to put forward one strange thing I observed, check out the screenshot attached below WhatsApp Image 2020-12-04 at 1 18 32 AM Note: some terms like "Comoros", "Brazzaville", etc. I couldn't see these terms on any of the predefined \apps\data-client\src\main\res\values\languages_list.xml (I did a search for both terms in the complete package through VS studio and found no result, screenshot attached) which meant it's retrieving names from somewhere else. Capture

  3. Then I went to the file SpinnerAdapterLanguage.kt , the spinner is retrieving the language through

    `
    init {
        val sortedLanguages = Locale.getAvailableLocales()
            .map(::Language)
            .sortedBy { it.locale.displayName }
        languageNamesList = sortedLanguages.map { it.locale.displayName }
        languageCodesList = sortedLanguages.map { it.locale.language }
    }
    
    var selectedLangCode = ""
    
    override fun isEnabled(position: Int) = languageCodesList[position].let {
        it.isNotEmpty() && !selectedLanguages.containsValue(it) && it != selectedLangCode
    }
    
    override fun getCount() = languageNamesList.size
    
    override fun getDropDownView(position: Int, convertView: View?, parent: ViewGroup) =
        (convertView ?: parent.inflate(R.layout.row_item_languages_spinner).also {
            it.tag = DropDownViewHolder(it)
        }).apply {
            (tag as DropDownViewHolder).init(
                languageCodesList[position],
                languageNamesList[position],
                isEnabled(position)
            )
        }
    
    override fun getView(position: Int, convertView: View?, parent: ViewGroup) =
        (convertView ?: parent.inflate(R.layout.row_item_languages_spinner).also {
            it.tag = SpinnerViewHolder(it)
        }).apply { (tag as SpinnerViewHolder).init(languageCodesList[position]) }
    
    class SpinnerViewHolder(override val containerView: View) : LayoutContainer {
        fun init(languageCode: String) {
            LangCodeUtils.fixLanguageCode(languageCode).let {
                tv_language.text = if (it.length > 2) it.take(2) else it
            }
        }
    }
    `
  4. As you can see tv_language (spinner) is retrieving the value through Locale.getAvailableLocales() as I suggested earlier, and as far I know this method returns the array collection of all the installed locales and as per https://www.geeksforgeeks.org/locale-getavailablelocales-method-in-java-with-examples/ The getAvailableLocales() Method of Locale class in Java is used get the collection of arrays of all the installed locales by the Java Runtime Environment and the LocaleServiceProviders.

  5. I still think the way I referred in https://github.com/commons-app/apps-android-commons/issues/3681#issuecomment-736795547 as PREFERRED IDEA would be a fine approach.

Is there anything I missed or misunderstood here? Or is there any other better way of solving this? Do suggest.

nicolas-raoul commented 3 years ago

Thanks for the investigation!

Where the spinner retrieves a list of languages, please use getCodes() for the identifiers and getLocalizedNames() to display in the GUI. That is the list of language choices we need :-)

Please list them in the order they come in (most popular languages first). If we figure out something smarter it will be the subject of a separate issue. Thanks a lot!

Saral-code commented 3 years ago

@nicolas-raoul Fixed the issue just the way you requested. All changes and other important info is listed in the PR. Just tell me if its fine or any other changes are needed then I'll update the java docs.