go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
44.21k stars 5.42k forks source link

Incorrect handling of plural forms for translation #23797

Open mikhirev opened 1 year ago

mikhirev commented 1 year ago

Description

Hi!

The current handling of strings with multiple plural forms by mapping them to key-value pairs in ini file is incorrect. It allows using only two plural forms like in English (singular for one only). There's also no problems that have only one plural form. But there are many languages with three and more plural forms (you may find a review here for example). Correct translation to such languages is currently impossible.

Please consider changing the translation framework to handle pluralization properly.

Gitea Version

1.19.0

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

n/a

Database

None

delvh commented 1 year ago

See also https://github.com/go-gitea/gitea/pull/19916.

wxiaoguang commented 1 year ago

What do you think about this approach #23933 ?

(I don't understand lv or ar, so use English words for demo)

Because each language has standard defined Plural Forms ( https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml ) , so we can just put a array-like candidate word list in the string.

With this approach, translators just need to fill the words for these forms.

I think we need a crowdin-compatible and ini-compatible syntax, because Gitea is using these systems.

And we need a translator-friendly syntax, otherwise the strings could get broken frequently (I have found a lot of broken translation strings recently ... )


I haven't tries how Crowdin handles the pluralization work, whether it has other better approaches, or whether there is a better translation system.

mikhirev commented 1 year ago

@wxiaoguang, this approach will not work for different languages. E. g. in Russian we usually don't use the verb (is/are) in sentences like that. In other languages some additional words may be needed. The common practice is to allow translators deal with the whole string, not separate words, good examples are ngettext and gotext plurals.

Probably the simplest way is to implement the ICU message format support. It is supported by Crowdin and there is a Go module for parsing it.

wxiaoguang commented 1 year ago

Thank you, then I think ICU message format is the answer.

techknowlogick commented 1 year ago

@mikhirev thank you for those details and your research into libraries. ICU format looks good. We could probably force ICU format into ini, and normally I'd be against changing format of config files, but maybe this is an opportunity to look into getting away from ini (even if only for translations).