brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.51k stars 2.27k forks source link

Serbian Language Script (sr) #39792

Open AndyAnds265 opened 1 month ago

AndyAnds265 commented 1 month ago

Description

At the moment, we are not optimized for the Serbian language. Serbian language users are currently experiencing an inconsistent and odd UI experience.

There are about 50k current Serbian users on Android so this is not an insignificant group. It has even been brought up in other areas. (https://github.com/brave/brave-browser/issues/35417#issuecomment-2151009766)

Basically here is the background and issue explained:

Screenshot_20240715-212754

Steps to reproduce

  1. Change Android device language (in settings) to Serbian (Latin) or Serbian (Cyrillic)
  2. Delete the Brave app
  3. Re-install the Brave app
  4. Launch app

Actual result

See mixed script usage Screenshot_20240715-212140

Expected result

it should be 1 consistent script

Reproduces how often

Easily reproduced

Brave version

1.68 and previous *apparently this has been an issue for years

Device

Channel information

Reproducibility

Miscellaneous information

No response

AndyAnds265 commented 1 month ago

Serbian (Cyrillic) device setting

https://github.com/user-attachments/assets/907d51cb-5998-4741-8a8b-ff34a270d8ee

AndyAnds265 commented 1 month ago

Serbian (Latin) device setting

https://github.com/user-attachments/assets/5f72d427-0692-408b-bac7-61d42ecff748

AndyAnds265 commented 1 month ago

In order for us to fix this issue, we need to:

AndyAnds265 commented 1 month ago

Here is a dictionary of the Serbian alphabets, we can likely leverage the 4th dictionary to convert characters: https://docs.google.com/document/d/16RSD7ut05e4vtjZFQogKMB8_31xeUslZ0VBkn_iCbQ4/edit

Susexe commented 3 weeks ago
  • For the sr locale, we'll need to convert the entire string base using an alphabet swapping script. I've created one before and should be able to do this relatively easily working with a dev. Changing the alphabet is literally just a swap of characters. The words themselves do not change

Sadly, this is not as easy as you think. Just by going on Brave Search and setting the display language to Serbian (Cyrillic) I can already see plenty of errors. There are just too many exceptions to cover, which is not possible by using transliteration scripts. For instance, your script currently transliterates the digraphs LJ and NJ to ЉЈ and ЊЈ instead of Љ and Њ. It's probably the same for ЏЈ instead of Џ. Sometimes these letters, one after another, are used separately, which your script wouldn't know. There are some efforts to overcome this with exception dictionaries, but they're not reliable. Once we include English words and acronyms into the picture, it's even a bigger mess (e.g., AI should be left as is, not АИ).

This is exactly the reason why I offered to help: Serbian Latin to Cyrillic transliteration is never fully correct, while Cyrillic to Latin 100% is. The real solution is to switch to Serbian Cyrillic as the main script. To accomplish this, after machine transliteration, human review is necessary. After that, everything would be fully automated.

AndyAnds265 commented 3 weeks ago

@Susexe thanks for the clarification. We're going to be switching to Serbian Cyrillic as our base language.

How could you help in this situation for example?

Susexe commented 2 weeks ago

@AndyAnds265 To start, I could review and correct the transliterated strings you processed with your script (including Brave Search and possibly the Android app, once it's completed). After that, I can assist in improving the overall translation quality, which currently needs some work. For now, however, the priority is to fix the strings and establish a solid base language for Cyrillic-to-Latin conversion.