asterics / AsTeRICS-Grid

Free and simple to use app for augmentative and alternative communication (AAC) with offline support, flexible input methods and media access
https://grid.asterics.eu/
GNU Affero General Public License v3.0
56 stars 20 forks source link

Translation to additional languages #369

Closed ms-mialingvo closed 1 month ago

ms-mialingvo commented 9 months ago

The current options for language translations are missing some languages that I need.

Possible solution (A) Make it possible for people to add languages as they want. Those new languages would need to be available for everyone to use afterwards. Pro: It wouldn't result the huge list that we'd get if we'd import all 639-3 ISO codes. Languages that aren't included in 639-3 like very specific dialects (as I need one) still could get added. Contra: People would've to follow the instruction to use 639-3 codes, otherwise it could become a mess of codes.

Possible solution (B) Add all options from 639-3. Pro: That list contains pretty much all possible languages and dialects anyone would ever want to translate to. (See e.g. https://iso639-3.sil.org/code_tables/639/data?title=&field_iso639_cd_st_mmbrshp_639_1_tid=All&name_3=arabic&field_iso639_element_scope_tid=All&field_iso639_language_type_tid=All&items_per_page=200 for arabic) Contra: It would result in a huge drop-down list. I'd still need for you, Benjamin, to specifically program the one dialect that I need that isn't there, but I'm pretty sure that would be a one-time exception.

ms-mialingvo commented 5 months ago

Re-thinking this, solution A is better because of the flexibility.

klues commented 2 months ago

Proposal: what about adding a second select box next to "grid content language", which allows to optionally choose from a list of all countries? Without selection of a country, translations internally will still be saved only by language-code (e.g. de), with selection of the country language code and country code (e.g. de-ch). Using this method we have quite all possibilities covered (but not if there are several dialects within a single country).

Additionally I would only add this option to for the grid content language and stay with the 2-digit code for the application language for now.

klues commented 2 months ago

new idea: just add a checkbox "enable localized languages" and when checked, show localized languages within the select box for choosing the content language. For now add these localized languages: https://www.andiamo.co.uk/resources/iso-language-codes/

klues commented 1 month ago

At first I thought this will be quite some work, then I thought it's very easy, finally it was quite some work - see the commits above 😆

I've implemented this and released it to "latest":

Open questions:

Please try to test everything well, in order to make sure I haven't introduced any bugs.

ms-mialingvo commented 1 month ago

Oooh, awesome!! :) Thank you so much! I will test it on Friday or the weekend.

for German there is no de-de for Germany. Do we need it?

I guess you mean de-loc for dialects within Germany? Because de-de wouldn't make sense to me, the same way there is no pt-pt for Portugal etc, de is already "German in Germany".

(Anyway, the answer to both open questions would be 'no' as far as I'm concerned.)

klues commented 1 month ago

I'm not sure what the common understanding is regarding these locales, but de-de is definitely existing. The same for es-es (what we also have not in AG now). See e.g. this list of locales, where both are listed: https://simplelocalize.io/data/locales/ I assume it comes down to a discussion if German from Germany or Spanish from Spain are the base language or not. For English it seems like the majority has agreed that both exist, en-us and en-gb.

However, if you don't need them, let's keep it as it is.

ms-mialingvo commented 1 month ago

Oh, okay. I'm used to seeing "de" for Germany and "de-xx" for outside-Germany-German in ISO codes but there are so many variations of these ISO lists...

windows edge On Windows Tablet, Edge browser the dropdown doesn't show e.g. "Arabic, Egypt" but "Arabic, country.eg". On Apple and Android (Firefox browser) it's naming the countries. It's just a tiny blemish, but if there's a way to easily correct that, that would be perfect.

On Android (Firefox browser), languages that are bought through acapella look like this: android firefox ...but I guess the naming is up to the TTS app and there's nothing AG can do about it?

Indian English is missing, can you add that? There are also TTS voices for Mandarin, Cantonese and Sichuanese but ISO mostly just lists "Chinese" as language. Not quite sure how to solve that but as long as no one plans to make gridsets for those languages/dialects it's not an issue, I guess.

ms-mialingvo commented 1 month ago

I've been checking all currently available TTS voices that could therefore be potentially used in AG:

ms-mialingvo commented 1 month ago

Also, for Apple, the dropdown shows "lang.ji" and "lang.sb" instead of the name of these two languages.

klues commented 1 month ago

On Android (Firefox browser), languages that are bought through acapella look like this:

It seems like they're using sp as language code for spanish, which is wrong, because it should be es. However, I've now also translated lang.sp to "Spanish", so we're fixing the errors of others ;) About the name of the voice espanol (ESP,Ines) it's simply the name they gave to the voice. The info online seems also to be wrong in this case, but it's the info I get via the API...

Indian English is missing

I've added it.

There are also TTS voices for Mandarin, Cantonese and Sichuanese but ISO mostly just lists "Chinese" as language.

We can add any language and country codes and translate them. However it probably only makes sense, if e.g. also the creators of TTS-voices are using the correct/same language codes.

Montenegrin; Spanish, USA; English, India; Filipino and Bhojpuri are available as TTS but not in the list.

Which OS you were using? iOS? For iOS it's a game of chance which TTS voices they allow for being used in webapps. However, with the new release I've added a new System device voice, which can make some more iOS voices available for AG, see https://github.com/asterics/AsTeRICS-Grid/issues/223#issuecomment-2371090444

Also, for Apple, the dropdown shows "lang.ji" and "lang.sb" instead of the name of these two languages.

According to this table on Wikipedia about language codes ji and sb are not valid language codes. I can translate them like sp, but I need to know which languages they are ;)

I've released this state to main version with https://github.com/asterics/AsTeRICS-Grid/releases/tag/release-2024-10-10-09.52%2F%2B0200 I'm closing the issue, for minor changes like translating ji and sb, just comment again, for bigger things please open a new issue.

ms-mialingvo commented 1 month ago

The info online seems also to be wrong in this case, but it's the info I get via the API...

Yes, luckily it is wrong, all languages downloaded from acapela on android are available offline :)

Which OS you were using?

I've been literally checking all available options and combinations. I want to create a list so people can check which system with which browser works best for the combination of the languages they need. Currently there are speech synthesis available to be used in AG for 107 languages. Montenegrin: online male voice available in both apple (ios 17-something, will need to check again with ios18) and windows tablet Spanish, USA: available on acapella (so, offline) for android and online on windows Filipino: available both on windows and android online Bhojpuri: available on iPad (again, that might have changed with the ios update)

According to this table on Wikipedia about language codes ji and sb are not valid language codes. I can translate them like sp, but I need to know which languages they are ;)

From what I can find out "ji" is for Yiddish although the correct current form should be "yi" and "sb" is seemingly for Solomon Islands. No clue if that just mean English or Pijin, the other language used there.