[bug: i18n] Serbian translation of VISION file doesn't specify script

DerekNonGeneric commented 2 years ago

Serbian is practically the only European standard language whose speakers are fully functionally digraphic, using both Cyrillic and Latin alphabets. — https://en.wikipedia.org/wiki/Serbian_language#cite_ref-18

This seems to be a unique case in natural languages, but the Serbian language can be expressed in either Latin (sr_Latn) or Cyrillic script (sr_Cyrl), so what i said about “two-letter codes, one per language” in https://github.com/openinf/.github/pull/34#issuecomment-1230322175 is a little bit incorrect due to this edge case. I think we will need to specify which one of the alphabets we are using when making these translations to prevent mixing them.

/cc @emmitrovic as i am unsure which one we should use (is one preferable or maybe we need both?)

I am not exactly sure, but the only other time this may come up elsewhere is when distinguishing btwx Simplified Chinese (zh_Hans) and Traditional Chinese (zh_Hant)…

/cc @septs as this file naming scheme seems to be gaining widespread adoption (guideline may be needed)

To be clear, this is not about localization in the sense of wanting to distinguish btwx Portuguese from Portugal (pt_PT) and Portuguese from Brazil (pt_BR), which use the same alphabets. On the topic of Portuguese: I was hoping to be able to only use pt and leave the translation style up to whomever does the translation. For example, some things sound better in pt_BR and others in pt_PT, so if we can avoid making the distinction, i would prefer that.

/cc @erickwendel as am unsure if this would be practical (am still learning and unsure of edge cases)

Any additional insight you all can provide would be appreciated!

emmitrovic commented 2 years ago

We'll use (sr_Latn) for this! 👍🏻

ErickWendel commented 2 years ago

This is completely out of what I know :(

DerekNonGeneric commented 2 years ago

We'll use (sr_Latn) for this! 👍🏻

Cool, @emmitrovic, if you could handle the sr_Latn translations, my friend who initially did the sr_Cyrl could probably continue doing the rest of the files in sr_Cyrl, so you would probably only need to convert from sr_Cyrl to sr_Latn in the future, which would probably make it a lot easier (and you'd probably be able to catch any mistakes in the process too 😉). Hopefully, he agrees to it since he seemed upset about having the same content using different alphabets.

So for now, here is what should be done to solve the issue w/ this particular file:

[x] rename our current VISION.sr.md to VISION.sr_Cyr.md
[ ] create a new file named VISION.sr_Latn.md w/ the contents of VISION.sr_Cyr.md converted to use the Latin alphabet

It sounds simple to me, but am unsure whether changing the alphabet can be more challenging than it sounds.

Does this sound reasonable?

My friend told me that from his experience, it doesn't matter which alphabet we present to Serbian users, but i think it would be cool to allow Serbians to choose whichever alphabet they prefer reading by using a language dropdown on our site or something like that. (The content of these files will eventually end up on our site.) You can see what we did w/ https://tc39.es for some prior art (the language toggle is in the top right).

DerekNonGeneric commented 2 years ago

Relevant to this discussion: how a language selector component could work for this is discussed at https://github.com/unicode-org/icu4x/issues/2686.

OpenINF / .github

[bug: i18n] Serbian translation of VISION file doesn't specify script #84