All languages whose name uses the Latin alphabet should probably be upper case.

Arthur-Milchior commented 1 month ago

Currently, in the settings, some language names starts with a lower case and some with a upper case. If there is a reason, I think i'd be nice to document it. In the issue #13259 and the Pr closing it, #13275 I could not see it discussed neither by @david-allison nor @BrayanDSO.

As a French speaker, I can confirm there is no reason to use "français" instead of "Français". I would be surprised to learn that there is a reason for the lower case in Spanish (Español) or Esperanto, but I don't know those languages so I really can't tell either.

So my suggestion would be, either:

add a comment explaining the choice, if it was a real choice
to add upper case to the first letter of any language using latin alphabet

I see you discussed checking with native speaker whether the language appear properly. Did you ever do outreach ? if so, where, otherwise I'll do it. I just want to ensure I had not done duplicate work.

BrayanDSO commented 1 month ago

I did complain in Discord that portuguese started with a lower case, so I fixed it while I did that PR.

I haven't touched any other language because I didn't want to do the research to be sure of which languages should be capitalized or not. I simply copied what AnkiDroid used before that PR

If you are sure that all the latin languages should be capitalized, go for it.

Arthur-Milchior commented 1 month ago

I'm not sure. But if we are not sure that they should be lower case either, I think it's better to be consistent here.

However, I also have a better idea that I'll offer in a PR very shortly

david-allison commented 1 month ago

For documentation/reference:

        val fr = Locale("fr", "FR")
        fr.getDisplayName(fr) // français (France)

david-allison commented 1 month ago

I believe the source is: https://github.com/unicode-org/cldr/blob/main/common/main/fr.xml#L209

Permalink:

https://github.com/unicode-org/cldr/blob/8a22f670acc52c76eaf8ff2ed2720c001c13b3ae/common/main/fr.xml#L209

@Arthur-Milchior Why is this incorrect?

EDIT:

Display names for scripts, languages, countries, currencies, and variants in this locale are supplied by this element. They supply localized names for these items for use in user-interfaces for various purposes such as displaying menu lists, displaying a language name in a dialog, and so on. Capitalization should follow the conventions used in the middle of running text; the element may be used to specify the appropriate capitalization for other contexts (see ContextTransform Elements). Examples are given below.

https://www.unicode.org/reports/tr35/tr35-general.html#Display_Name_Elements

<contextTransform>

https://github.com/unicode-org/cldr/blob/main/common/main/fr.xml#L1521-L1541

ULocale also provides the same data (both IBM dependency, and under Robolectric with the Android dep)

        val ul = ULocale.FRENCH
        ul.getDisplayName(ul) // français

david-allison commented 1 month ago

I'm well past my timebox, I still don't know whether the proposed pull requests are good solutions. I feel the below solution would reduce process on our side, and the requirement for verification of the language names

I would /expect/ that we'd be able to get this data from CLDR, rather than leaving our translators to duplicate effort, but I believe the following still has problems with:

isiXhosa
isiZulu

As these are missing <contextTransforms> in CLDR.

tgl -> Filipino

[!NOTE] I used the IBM dependency, Android may produce different results


    @RequiresApi(Build.VERSION_CODES.N)
    @Test
    fun testAppLanguagesCapitalization() {
        val invalid = mutableListOf<String>()

        for ((displayName, tag) in APP_LANGUAGES) {

            val ul = ULocale(tag)

            val asList = LocaleDisplayNames.getInstance(ul, DisplayContext.CAPITALIZATION_FOR_UI_LIST_OR_MENU)
                .getUiList(setOf(ul),
                    false
                ) { _, _ -> 1 }
                .single()

            if (asList.nameInSelf != displayName) {
                invalid.add("$displayName -> ${asList.nameInSelf}")
            }
        }

        Assert.fail(invalid.joinToString("\n"))
    }

/*
azərbaycan -> Azərbaycan
беларуская -> Беларуская
български -> Български
català -> Català
čeština -> Čeština
dansk -> Dansk
esperanto -> Esperanto
español (Argentina) -> Español (Argentina)
español (España) -> Español
eesti -> Eesti
euskara -> Euskara
suomi -> Suomi
français -> Français
Frysk (Nederlân) -> Frysk
Gaeilge (Éire) -> Gaeilge
galego -> Galego
ગુજરાતી (ભારત) -> ગુજરાતી
hrvatski -> Hrvatski
magyar -> Magyar
հայերեն (Հայաստան) -> Հայերեն
íslenska -> Íslenska
italiano -> Italiano
қазақ тілі -> Қазақ тілі
kurdî -> Kurdî [kurmancî]
кыргызча -> Кыргызча
lietuvių -> Lietuvių
latviešu -> Latviešu
македонски -> Македонски
മലയാളം (ഇന്ത്യ) -> മലയാളം
монгол -> Монгол
nynorsk (Noreg) -> Norsk nynorsk
norsk -> Norsk
ਪੰਜਾਬੀ (ਭਾਰਤ) -> ਪੰਜਾਬੀ
polski -> Polski
Português (Brasil) -> Português
română -> Română
русский -> Русский
Santali -> ᱥᱟᱱᱛᱟᱲᱤ
Sardinian -> Sardu
slovenčina -> Slovenčina
slovenščina -> Slovenščina
shqip -> Shqip
српски -> Српски
svenska (Sverige) -> Svenska
тоҷикӣ -> Тоҷикӣ
Tagalog -> Filipino
татар (Россия) -> Татар
українська -> Українська
اردو (پاکستان) -> اردو
o‘zbek -> O‘zbek
isiXhosa -> IsiXhosa
中文 (中国) -> 中文
中文 (台灣) -> 中文（繁體）
isiZulu -> IsiZulu
*/

Arthur-Milchior commented 1 month ago

I'm well past my timebox,

Please take your time to review. There is no emergency. I believe this list of language should be improved, but this certainly can wait.

leaving our translators to duplicate effort

To be clear, we are only requesting them to write the name of their language. If it were done, I'd just have to read the comment and enter "Français" in a text field. This is something they can fill even if they don't know ankidroid well.

I don't think that it's worth optimizing to avoid asking them to do this translation.

If you fear that it'll take us too much work because we'd get bad translation here specifically, I can hear that. But that's effort for reviewers, not for translators.

@Arthur-Milchior Why is this incorrect?

Why is what incorrect?

If you mean the entry "français", then I'd just state that I'd expect to see an upper case for the "f". And I expect the same problem will occur in other language.

david-allison commented 1 month ago

Why is https://github.com/unicode-org/cldr/blob/main/common/main/fr.xml#L209 incorrect?

david-allison commented 1 month ago

To be clear, we are only requesting them to write the name of their language. If it were done, I'd just have to read the comment and enter "Français" in a text field. This is something they can fill even if they don't know ankidroid well.

Would Chinese (China) and Chinese (Taiwan) both map to 中文 under this proposed system?

Proposal

Merge (after changes + test): https://github.com/ankidroid/Anki-Android/pull/17121
Close: https://github.com/ankidroid/Anki-Android/pull/17120

Rationale

Optimizing for translator time here isn't worthwhile.

Optimizing for correctness is worthwhile.

Having the process of adding a language be as streamlined as possible (especially when the proposed step is inherently political) is worthwhile.

My above message considering the use of CLDR likely won't be feasible, given tgl is mapped to Filipino

Use the Android System as a 'base' for language names (as we currently do)
- Removes the political nature of having one person select a language name. Deferring to Android/Google whenever possible seems sensible, since they've put in the work of determining the 'standard' name to use
Fix the casing of all languages
- Adds a minor task whenever a new language is added
Add a unit test to ensure CLDR updates [in new Android versions] are handled and mirrored in AnkiDroid
- Sardinian and Santali were both revealed to be buggy by my test above
- Any manual changes (besides capitalization) would need to be flagged and ignored

Arthur-Milchior commented 1 month ago

"français", which is the way the string appear in AnkiDroid today, is correct in the sense that it's totally understandable. But, as it's alone on its line, I'd expect it to be considered as a full sentence, and start with a upper case.

It could also be considered to be part of a list and keep a lower case, as one element of a sentence which is a list of 93 languages.

The main problem I'd is that this is not consistent. If everything had lower case, I'd be strange but not shocking. But having a upper case on some languages and not other would be insulting. Upper case can be used in French as a sign of respect or of politeness. I admit I never considered that it may not be universal until today, and it may be specific to France. The way the list is written, it seems like French and esperanto is not worth as much respect as Filipino and English. The same way that, if I recall correctly my high-school lectures, some activist wrote "king" and "President" to note that while they despised monarchy, they had respect for the elected role.

I am quite certain that this is not what anyone had in mind when creating the document, and I'm not accusing anyone of insulting the French language. And to be honest, while it was clear to me that what I was seeing was wrong, I had not taken the time to consider why. But I hope it explains at least why I believe that, if other language has an upper case, Français should have one too in the list.

I tried to look at what other list does. Discord, google's "my account" page and Pixel phone give a upper case to Français and to all languages. Google search and Google Chrome use lower case

Arthur-Milchior commented 1 month ago

To be clear, we are only requesting them to write the name of their language. If it were done, I'd just have to read the comment and enter "Français" in a text field. This is something they can fill even if they don't know ankidroid well.

Would Chinese (China) and Chinese (Taiwan) both map to 中文 under this proposed system?

Assuming that current LanguageUtile code is acceptable, then it would be "中文 (中国)" and "中文 (台灣)" respectively. This is the very reason why I mentioned the disambiguation part in the exemple I wrote.

I guess this means that they are two distinct languages. So I'd assume they both get a disambiguation parenthesis. Same way as I'd expect to add a disambiguation to French (France) if one day we end up with French Canadian version (I doubt it'd occur. )

david-allison commented 1 month ago

We're in for a world of pain here if we need translators to understand the subtleties of Locale

If you pushed for disambiguation, I would suspect that most would [incorrectly] describe the scripts:

中文（简体）: zh-Hans
中文（繁體）: zh-Hant

rather than the regional variants:

中文（中国）: zh-CN
中文（台灣）: zh-TW

We're much better off with using Android's names as a base for endonyms (as we currently are doing), getting the casing data from CLDR/ULocale (to fix français), and reaching out to translators on a case-by-case basis for potential mismatches between our understanding and CLDR (isiXhosa, íslenska)

BrayanDSO commented 1 month ago

A lot of big comments for me to read right now, but my opinion in general is:

Keep the strings as constants. Don't send them to Crowdin
Get the name of the languages in a respectable source and change them if you want
Leave any future changes to whoever has some point about something

ankidroid / Anki-Android

All languages whose name uses the Latin alphabet should probably be upper case. #17118