Standardization of all the sound packs about String IDs and names

After browsing the sound files, because of implementing the sounds to certain switches, I realised, that we don't have the same sound files in every language directory. Also I noticed, that the String IDs aren't the same for the sounds in every language. For better implementation I think it would be useful to standardise each String ID to one sound / word / phrase. Then we can start to make the sound files for each language with the same files and not only half of them available in some language packs. Also the name of each wave file should be standardized. See the example below for that. The best starting point would be to use the english_GB csv file as the main, checking if other languages maybe have additional sounds and adding the missing there. Discussing here, how to name the files and then using this for every other csv file.

Two examples about what I mean about different IDs and also different naming of the wav files

The number 100

English
100 = 0100.wav -> ID 101 German hundert = 0101.wav -> ID 103

The word AND

English and = 0110.wav -> ID 111 German und = 0105.wav -> ID 106 French et = 0120.wav -> ID 121

Also I can understand, that different languages have things which maybe can not translate directly to each other, because of the different gramma and use etc. So as a suggestion, we could use String ID 0-999 for the general things implemented for every language. And special things für each languages like this, 1000-1999 Czech 2000-2999 German ... Every future implemented language can be added after that.

Yes, it would be useful, and I've already started on some of that ;) However, something cannot and will not change at the moment is the filenames, and whether system or non-system files. However, everything else is up for grabs. Ideally, there would be a single translation file, with all the languages in it, thus they must all have the same words/phrases/sounds. But to do this, I think English is the wrong starting place, as it doesn't have three variations of the same word depending on the gender of the speaker (which I think is what is happening with some of the other languages). Hm... unless, instead of it being

1000-1999 Czech 2000-2999 German

it were

1000 109-female 1001 109-male

or whatever... i.e. break the "variations" out to a separate number range... 🤔

A couple of thoughts:

the filenames the way they are now (e.g. telemok.wav) are more readable than numeric names (like 0110.wav) when you're selecting a file to use from a Global function (or another custom use)
the combination filename + path is what prevents one CSV input row from overwriting a file that was created by another CSV input row. The first column ID is meaningless, and not actually used as an ID (unless I'm missing something?).

they must all have the same words/phrases/sounds

Yes. This is particularly important for sounds that are referred to by EdgeTX code.

[...] we can start to make the sound files for each language with the same files and not only half of them available in some language packs.

Are you suggesting to start each new language / locale from a template? I like that idea @Schnuppi12.

But to do this, I think English is the wrong starting place, as it doesn't have three variations of the same word depending on the gender of the speaker (which I think is what is happening with some of the other languages).

From personal experience working on a number of localized systems: using English as the reference language for this is fine. The strings that are used in the context of EdgeTX will be translatable in any combination of language, locale and voice without issue (e.g. en_US Guy).

The speaker is often relevant when talking to or about people, but none of the EdgeTX strings is likely to be about people, not even the speaker themselves. In other words: it's common for languages to include rules or customs that modify speech when the speaker talks about themselves or to other people (e.g. if the other people are older, younger...). But that's not going to be relevant for the subset of text that's used in EdgeTX, where the sentences are essentially impersonal. Of course all voices will sounds different, but that doesn't affect what they can say or not, they'll say the same thing, only differently. Does that make sense?

An important part is recognizing that the voice is a part of the identifier, but this is already implicit in how the releases are structured. Example: there isn't one en_GB archive with multiple voices inside, rather we've got three distinct archives:

edgetx-sdcard-sounds-en_us-guy-2.9.0.zip
edgetx-sdcard-sounds-en_us-michelle-2.9.0.zip
edgetx-sdcard-sounds-en_us-sara-2.9.0.zip

The pattern language (en), locale (GB), voice (guy) is clear. As long as each trio gets its source file that's sufficient. (It may feel tempting to notice that some voice may share a source file, though that feels like premature optimization to me.)

The characteristics of the speaker: gender, age, whatever are already encapsulated in the voice itself (Azure uses names, like Guy or Michelle). So none of that is a concern for the purpose of creating language packs.

The first column is the unique ID number for an entry... the goal was to be able to go though all the translation files and ensure matching lines had the same ID, then it could be automatically merged into a single file by ID. If memory serves me correct, this was also a requirement for being able to use crowdin.

From personal experience working on a number of localized systems: using English as the reference language for this is fine. The strings that are used in the context of EdgeTX will be translatable in any combination of language, locale and voice without issue (e.g. en_US Guy).

That isn't the issue. Some of the languages have system files that are referenced by number only, and that number changes depending on the language. Which then collides with the key/identifier requirement (or recommendation?) for crowdin CSV importation.

Only the language and locale are unique at present, as no issue has been raised with voice/gender needing to be separated out. What is important to note here is this started out with a single voice for each language, and then more have been added on, with some kludge workarounds done as needed to prevent the need to move to a Azure specific syntax.

Some of the languages have system files that are referenced by number only, and that number changes depending on the language. Which then collides with the key/identifier requirement (or recommendation?) for crowdin CSV importation.

I think I've seen some of this, around sounds for numbers in particular. (example)

I agree then that it would make sense for the first colum (ID) to be the same across all languages/locales. I make a mental note that if the example code I linked above is indeed what you had in mind, then modifying existing files would require some coordination to avoid breaking the compatibility between sound pack and firmware.

Keeping stability around the file paths and names seema desirable as well, as you mentioned above. I see no conflict between that and standardizing the IDs at some point.

Thanks for taking the time to clarify @pfeerick! This makes sense to me.

Ideally, a merger would have no breakage, as there would simply be gaps for languages without the need of a phrase. i.e. CZ is probably the worst offender here... that language file has both male, female and neutral phrases in it... so words like the suffix for radian is there three times... but probably no other language. Plus, I have no idea how the firmware is was actually ever supposed to distinguish between them... it seems to arbitrarily jump from one to the other depending on the phrase.

Plus, I have no idea how the firmware is was actually ever supposed to distinguish between them... it seems to arbitrarily jump from one to the other depending on the phrase.

Short version: Indeed. If the localization library in the firmware doesn't support variations, adding keys in the translation files is pointless (see long version for more nuance and what that support may look like). The number of keys / strings to translate should be the same in all languages either way (see example in the long version, notice how the variants are handled within the string / message / single key).

Long version: This is a common localization difficulty. Trying to compose sentences on the fly is a difficult trade-off as soon as multiple languages are involved.

I the cases where it's possible, though, the localization library that the firmware uses needs to support those variations, there is no way around that. The most typical example is probably the handling of plurals.

In a single-language English application it may be sufficient to write something like (pseudo-code):

switch count:
  0: print("Updated no items.")
  1: print("Updated one item.")
  default: ("Updated {count} items.")

English has two forms for item, one singular (item) and one plural (items). But that varies across languages, and some have distinct forms for zero, one, two, a few, many... The way most localization libraries handle that is by retrieving the strings for a given key, locale and count (in this case).

The translated strings would look like this (to take the ICU message format as an example):

# English:
itemUpdated: {count, plural, =0 {Updates no items.} one {Updated one item.} other {Updated # items.}}

# Some language that has more variants (this is what the translator writes to match the English string, they're different and that's OK):
itemUpdated {count, plural, =0 {Something.} few {Something # else.} other {Yet another variant # items.}}

The localization library then is used like this (pseudo code):

import some-localization-function as _

translated_string = _(itemUpdated, { count: 3 })

This example would produce "Updated 3 items." or "Something 3 else." depending on the selected language. But it's the localization function that needs to be aware of the formatting details (e.g. know how to use retrieve the correct variant from the ICU message i this example).

If the localization tooling in the firmware doesn't support that, then I'd argue that there is little point in trying to compensate in the translation files by adding extra keys. (Too little, too late.)

Now, I said that composing sentences if a difficult trade-off. Even the ICU message format has limitations, namely, only one variable can realistically be varying in any given string. In French, for example, you'd count items differently depending on their gender (because items do have a gender in French) so a string like: "This is your {count} shirt." can be translated fine, but This is your 1st {item}" and "This is your {count} {item}" would give the wrong result roughly half the time and there is nothing a developer could do to avoid that (except changing the strings of course, which is the right thing to do).

To make the French example clearer, this is what the strings could render to.

message: `"This is your {count} shirt."` (en)
message: `"C'est ta {count} chemise."` (fr)

code: _(message, 1)
output (fr): "C'est ta 1ère chemise."  GOOD
code: _(message, 2)
output (fr): "C'est ta 2ème chemise."  GOOD

message: `"This is your 1st {item}."` (en)
message: `"C'est ta 1ère {item} ."` (fr)  WRONG as soon as item is not feminine, to a table is fine, but not a hat.

code: _(message, "table")
output (fr): "C'est ta 1ère table."  GOOD

code: _(message, "chapeau")
output (fr): "C'est ta 1ère chapeau."  WRONG

should be: "C'est ton 1er chapeau." (a.k.a not even close)

And to expand on it: the counting words themselves are different depending on the nature of what is being counted in Japanese and Chinese for example (so you don't use the same number words to count 2 cats and 2 computers).

The way to get translations exactly right is to only translate full sentences. (So the entire context is known to the translator, e.g. what is being counted and how many pieces there are.) That's often impractical.

Alternatively, some variations can be handled okay, like plurals, in a narrow set of cases which is usually sufficient as long as the strings are composed carefully. Now, if there is no support from the localization library, even that is unlikely to yield satisfactory results. I think it's a trade-off to consider carefully (complexity of the code vs how much better the translation really gets). A proper localization library does remove most of the complexity, and make the trade-off much easier to cut - but that's a conversation for the EdgeTX, and Buddy repos, not for this one, I think?

For the sake of mentioning it: translating sentences fragments and piecing them together is a no-no. (Gladly, that's not something that I think is happening in the voice packs, because the sounds are very short phrases anyway.)

Ideally, a merger would have no breakage [...]

I think this is totally manageable. The worst case scenario (which I don't find that bad personally) would be to keep the old IDs, add the new ones, and set up a deprecation period for the code to evolve towards using the new IDs, with a fallback to the old IDs. Then remove the old IDs only once we're comfortable telling people that version X of EdgeTX requires at least version Y of the voice packs (where the new IDs were introduced) and clean up the corresponding fallback code.

And if we expect people to update their voice packs when they update EdgeTX, then maybe all the fallback dance is not needed?

EdgeTX / edgetx-sdcard-sounds

Standardization of all the sound packs about String IDs and names #48