Mishandled Languages: Tagalog, Hebrew, Indonesian

david-allison commented 4 years ago

Catch-all:

I'm at 8788991100b268e9e55875f95f705516ae14e61b

Language is entirely in English
- ~~Gaeilge (Eire)~~
- Indonesia (80%)
- ~~isiZulu (Zulu on Crowdin)~~
- ~~Kiswahili~~ (Swahili)
- Tagalog (66%)
- ~~Swati~~
- ~~o'zbek~~ (Uzbek)
- ~~Tsonga~~
- ~~Tswana~~
- ~~Venda~~
- ~~Wolof~~
- ~~Xhosa~~
- ~~islenska~~
- ~~кыргызча~~ (Kyrgyz on Crowdin)
- ~~монгол~~ (Mongolian on Crowdin)
- ~~татар~~ (Tatar)
- ~~тоҷикӣ~~ (Tajik)
- ~~қазақ тілі~~
- עִבְרִית (Hebrew)
- ~~Punjabi~~
- ~~ગુજરાતી~~ (Gujarati)
- ~~አማርኛ~~ (Amharic)
Tagalog newline issue:

https://github.com/ankidroid/Anki-Android/blob/9595b84f70dd56bb539ff9cf9a6a960d7314a49f/AnkiDroid/src/main/res/values-ta/01-core.xml#L121

eginhard commented 4 years ago

There just aren't any translations for those languages yet on CrowdIn: https://crowdin.com/project/ankidroid

david-allison commented 4 years ago

Thanks, will cross those off the list. There's still some discrepancies

david-allison commented 4 years ago

On my Android 9 phone:

Hebrew works if I rename it to iw Indonesian works if I rename it to id

mikehardy commented 4 years ago

Do you have access to emulators? you may not if you're on the old Surface If this reproduces API16 to API30 then we just need to add two more special case maps into ./tools/update-localizations.py similar to the one for 'yu' If you're unable to test it on the other APIs just give it a shot in a PR and I'll pull it and check on the end points Or I suppose there could be a way to unit test it, the LanguageUtil unit test already has the skew in there but how to have a stable value from the resources to test against? :thinking: - basically just let me know if you need me to check it

david-allison commented 4 years ago

I can run API 16 and 23 currently, both are pretty snappy (and 16 was invaluable for fixing the cert error on the weekend)!

I'll see if I can get an Instrumented Test up for this - seems like the best way to go - ideally testing loading of the resource file itself..

mikehardy commented 4 years ago

I was under the impression that robolectric would actually load the app resources correctly if you specified different configs (there are examples in the current test)

Arthus commented 4 years ago

Just a very small comment (ok, it became larger than I wanted it to be)

The best language codes would propably be: Hebrew: heb Indonesia: ind Tagalog: tgl

Have a look at https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes Android's Locale() functions only want ISO 639-2 and 639-3 (and IETF language codes), so I would ignore the first column of codes (ISO 639-1) The list also tells you, that Hebrew changed from "iw" to "heb" (ISO 639-1: "he") in 1989. The same is true for indonesian (until 1989 "in", now "id"/"ind")

Now, the two character codes are ISO 639-1, so they shouldn't even be used anymore (ISO 639-2 and 639-3 only use 3 character codes). When Android works with the 2 character codes, I guess it always interprets them as IETF language codes, because they still use the ISO 639-1 standard (So basically we are back to all 3 ISO 639 standards being available.) But Android handles the hebrew/indonesia codes specially, because ISO 639-1 changed them back in 1989 (when Android didn't even exist as an idea. Why do they use the old codes at all? In the SDK code they actually write:

This constructor accepts both the

old codes ("iw", "ji", and "in") and the new codes ("he", "yi", and "id"), but all other

API on Locale will return only the OLD codes.

(the comment from the Locale(String language) constructior)

The most universal way to deal with all of this would propably be: Convert all language tags for natural languages to ISO 639-3 tags, as they are the most universal ones and cover more languages than two character ISO 639-1. All codes would be 3 characters wide and there would be no way of misinterpreting them any more.

If we want to support some accent or local variety of a language, we have to use IETF language tags, and those would contain a hyphen to further specify it. The only question would be, if Android would handle por-BR por-PT correctly. (ISO 639-3 language tag + country tag) Or if this would have to be pt-BR pt-PT (IETF/ISO 639-1 language tag + country tag)

But at least it would be unambiguous:

Priority 1: 3 character ISO 639-3 code
Priority 2: IETF language tag with hyphen for everything else, that can't be covered by ISO 639-3 alone

mikehardy commented 4 years ago

Don't forget we have to maintain compatibility with whatever we can get crowdin to do as well. I had no success getting the 'yu' language in there to come out as 'values-yue', so I had to add a manual mapping for that code in tools/update-localizations.py

I'm happy to delegate admin power up there to someone else that wants to play but note my earlier failure, and also that they only let you build new bundles for download using the API every 30 minutes (I think?) but manually you can rebuild them more frequently I believe.

I'd be fine personally with just manually mapping these additional 3 languages during import to AnkiDroid source / post export from crowdin similar to yu -> yue - it would get the job done

Final note, for variants note that android does require an 'r' after the hyphen so it wouldn't be values-por-BR as a test it would be values-por-rBR in the actual filesystem

mikehardy commented 4 years ago

good find on these three languages either way - that's worth saying.

Arthus commented 4 years ago

I'm happy to delegate admin power up there to someone else that wants to play but note my earlier failure, and also that they only let you build new bundles for download using the API every 30 minutes (I think?) but manually you can rebuild them more frequently I believe.

Feel free to add me as an admin. What do you need? My name there is the same as here: Arthus

/edit: In my test project on CrowdIn I'm able to directly export all languages to the correct Android structure (including the additional r). So if everything works as I expect it to do, we might be able to remove quite some work arounds. I can also set different codes for languages. So overwriting the current codes for hebrew etc will be easy.

When you give me admin rights on crowdin, the only thing we have to do is to define the way to move forward.

By default crowdin exports the strings for Android as

values-en-rUS
values-de-rDE
values-ja-rJP

etc. Because AnkiDroid's crowdin project doesn't translate/export an english version (the default /res/values/ folder is crowdin's "master"/upstream) we don't have to worry about that. But how do we want to deal with other languages? Have all of them as "values-XY-rZA"? Or default to values-XY for most of the time (this would actually be 5 minutes more work on crowdin, so no big deal, but would still differ from crowdins default behaviour. ;) ) I think Android could also read folders like this: values-XYZ (e.g. "values-yue"), but this would have to be confirmed. The advantage would be, that we can correctly define the actual language (yue=Cantonese). Most codes work like this: <language code>-<country tag>, e.g. pt-rBR pt-rPT zh-rHK (HK=Hongkong, where cantonese seems to be common)

zh-yue would be possible (<language code>-<extended language code>), but in that case, we could also just use "yue" In theory something like zh-yue-HK would be possible (Chinese dialect of cantonese, when spoken in Hongkong, in contrast to zh-yue-MO: cantonese spoken in Macao)

My personal preferences:

Export german etc as "values-deu", cantonese as "values-yue", portugese as "values-por-rBR" and "values-por-rPT" etc. (Meaning: Use ISO 639-3 as much as possible and only use country tags, when it makes sense.)
Alternatively leave the crowdin defaults as they are: german=values-de-rDE, cantonese: values-zh-ryue etc. (Even though austrian and swiss AnkiDroid users would in that case use the "german german language", so it might be not 100% correct and future proof.)

mikehardy commented 4 years ago

@Arthus - just as a heads up, if you edit a comment, I get no notification, and I am 100% reactive to notifications, I may never see it if you don't make a new comment, which is why this sat for 7 days :scream: - I am only just seeing it now.

I know 'values-yue' works as I map our crowdin-custom 'yu' to 'yue' as the way to make it work:

https://github.com/ankidroid/Anki-Android/pull/6313/commits/aeb0686f713cb0139d8975cf0519b51a49029b62#diff-b63586ec743f8739f099b19bf688b381R187-R188

So I would opt for choice 1 above, ('values-yue', 'values-deu', 'values-por-rBR' etc).

If you could make this change on crowdin so it exported correctly, in concert with the necessary changes to update_localizations.py / LanguageUtil.java / LanguageUtilTest.java that would be truly, amazingly helpful. You are listed as a "Manager" there already so I think you can do what you need to do? If not let me know

I will make a stopgap change for master/2.11.2 as indicated above for the 3 languages in questions now, mapping them to

Hebrew: heb Indonesia: ind Tagalog: tgl

...in the same way as I am mapping 'yu' to 'yue'

ankidroid / Anki-Android

Mishandled Languages: Tagalog, Hebrew, Indonesian #6338