Automattic / jetpack

Security, performance, marketing, and design tools — Jetpack is made by WordPress experts to make WP sites safer and faster, and help you grow your traffic.
https://jetpack.com/
Other
1.59k stars 799 forks source link

Jetpack-boost: Don't use _n() to distinguish between singular and plural #27672

Open tobifjellner opened 1 year ago

tobifjellner commented 1 year ago

Impacted plugin

Boost

Steps to Reproduce

When I was checking some translations for -boost, I noted that the plugin is using _n() incorrectly.

ref: https://plugins.trac.wordpress.org/browser/jetpack-boost/tags/1.5.4/app/assets/dist/jetpack-boost.js#L9 "Please follow the troubleshooting steps below for the page."//"Please follow the troubleshooting steps below for each of the pages." or "This page timed out:"//"These pages timed out:".

The function _n() should ONLY be used to correctly handle word forms etc for strings that include a number, and NOT to generically distinguish between singular and plural.

You can correct this in two different ways. Either just make a logical check in your code, like (pseudo-code): if 1==n then echo "This page timed out:" else echo "These pages timed out:"

OR include the value of n in the string (if it adds any value): "The following &d page timed out"//"The following &d pages timed out"

If you want more background, then I've got this article: https://gsm.fjellner.com/hints-about-i18n-let-your-code-to-shine-all-over-the-world/

A clear and concise description of what you expected to happen.

No response

What actually happened

No response

Browser

Google Chrome/Chromium, Mozilla Firefox, Microsoft Edge, Apple Safari, iOS Safari, Android Chrome

Other information

No response

Platform (Simple, Atomic, or both?)

No response

Reproducibility

Intermittent

Severity

No response

Available workarounds?

No response

Workaround details

No response

anomiex commented 1 year ago

ref: https://plugins.trac.wordpress.org/browser/jetpack-boost/tags/1.5.4/app/assets/dist/jetpack-boost.js#L9

That link is not particularly helpful; unfortunately the JS i18n has to work based on minified code, so the correct link is hard to find. These seem more useful:

Either just make a logical check in your code, like (pseudo-code): if 1==n then echo "This page timed out:" else echo "These pages timed out:"

The article you linked says not to do this.

OR include the value of n in the string (if it adds any value): "The following &d page timed out"//"The following &d pages timed out"

Why should this be necessary? Especially in the case you're complaining about here, where the rendering appears to be something along the lines of

These pages timed out: <list of 10 pages> Please follow the troubleshooting steps below for each of the pages.

Why should that have to be like this?

These 10 pages timed out: <list of 10 pages> Please follow the troubleshooting steps below for each of the 10 pages.

I could see an argument for always making it possible to include the number in the string by always wrapping in a printf, but I see no good argument for the blanket assertion that the number always needs to be used in the English string.

tobifjellner commented 1 year ago

That link is not particularly helpful

Well. That's the link I'm presented with as reference in the translation interface. It's not a good reference, but together with a couple of sample strings, you'll at least know roughly where to look for what. (But where I have totally no clue how you build your distribution package...)

So here is the reasoning: The _n() function is specifically built to allow correct translation of things like "John has 3 apples"; "There's 1 minute left", etc despite that various values of n in many languages will call for (steer) various forms of nouns, pronouns, verbs, adjectives, etc. In many languages it's way more complicated than in English.

Arabic is one of the most complicated cases: depending on the specific value of n, this function will pick the right alternative to build a sentence that includes a number among five or six variants in the translation!

In Russian, every instance of _n() needs to be translated into three versions. And if happens to have the value n=21, then the translation will be wrong, since _n() in this case has to pick the "singular".

Then there are a few languages where they may be able to somehow express a generic plural, but where sentences with numbers don't change, and every occurrence of _n() has only ONE translation.

So: Using _n() as a programming shortcut to select the singular or plural form simply doesn't work in many languages.

anomiex commented 1 year ago

You appear to have answered the question "Why shouldn't I do n == 1 ? __( 'singular' ) : __( 'plural' )?", or possibly "Why shouldn't I do _n( 'singular', 'plural', isPlural ? 2 : 1 )?", rather than the question I actually asked.

tobifjellner commented 1 year ago

Currently in your code, you want to say: "These pages timed out:" and then have a list. But if only one page timed out, then you change to "This page timed out:". This is a typical case where _n() is doing_it_wrong, since _n() should only be used with strings that contain the steering number value in the string itself.

What you suggested in this conversation, n == 1 ? __( 'singular' ) : __( 'plural' ) is perfectly fine. What I'm saying is that you can't use _n() as some kind of shorthand, – it's for a different use case.

anomiex commented 1 year ago

Says who? Why couldn't Russian translate that with the dual form, and so on for other languages, since the necessary number is being provided?

tobifjellner commented 1 year ago

Because the function _n() doesn't work that way. Let's look at the phrase pair: The page timed out during load. There could be various reasons but most likely a plugin is causing this issue. Please complete the following steps.//The pages timed out during load. There could be various reasons but most likely a plugin is causing this issue. Please complete the following steps for each of the pages.

Swedish handles plurals relatively similar to English, so no big surprises there, see https://translate.wordpress.org/projects/wp-plugins/jetpack-boost/stable/sv/default/?filters%5Boriginal_id%5D=12699742 Here GlotPress exects two strings, "singular" and "plural".

But the same string in Arabic needs to be translated SIX times, see https://translate.wordpress.org/projects/wp-plugins/jetpack-boost/stable/ar/default/?filters[original_id]=12699742 You need to make one translation for n=0, another translation for n=1, a third variant for n=2... and three more cases. How many translations are needed for each time _n() is used is a global parameter per locale.

So you should not use _n() for strings where the steering number is not a part of the translated string.

anomiex commented 1 year ago

As I said originally, that policy can wind up with some pretty redundant text.

These 10 pages timed out: <list of 10 pages> Please follow the troubleshooting steps below for each of the 10 pages.

Why should we have to repeat the number "10" twice? Or even explicitly mention it at all when someone could count the rows if they care exactly how many there are?

But if we follow your suggestion then a Russian might find it a little jarring to see "These pages timed out" using the plural form instead of the dual form when there are clearly only two pages listed, or vice versa, since in n == 1 ? __( 'singular' ) : __( 'plural' ) you have to pick one or the other for the __( 'plural' ) bit.

tobifjellner commented 1 year ago

I'm not saying that you should repeat that number all the time. I'm just saying that you should NOT use _n() for these strings when you don't include the number itself in the string.

Again: If you don't need to say how many items there are of something, then use plain translation calls, like __() and _e() (Just don't try to cut corners by calling _n() in a way that doesn't work for many target languages.)

anomiex commented 1 year ago

There seems little point in continuing this back-and-forth. I'll leave it to the Boost team to decide which of us they want to listen to.

It seems to me that you're unable to or are refusing to see that multiple messages are being combined such that a specific number is presented, just not in every individual message, and I give up on trying to convince you.

IMO the specific request here is still not justified, and at most the code should be changed along the lines of

sprintf(
    /* translators: %1$d has the count of pages. This is not used in the English message as the count is already displayed elsewhere on the page. */
    _n(
        'The page timed out during load. There could be various reasons but most likely a plugin is causing this issue. Please complete the following steps.',
        'The pages timed out during load. There could be various reasons but most likely a plugin is causing this issue. Please complete the following steps for each of the pages.',
        urlCount( set ),
        'jetpack-boost'
    ),
    urlCount( set ),
) 

i.e. make the number available in case some language needs it but without redundantly displaying it in the English.

tobifjellner commented 1 year ago

Just one last time: If you don't need/want the number to be in the string, then _n() is simply not the correct construction to use for this. Using printf() without consuming a variable will create additional issues in various places.

anomiex commented 1 year ago

Using printf() without consuming a variable will create additional issues in various places.

Other than the @wordpress/valid-sprintf eslint rule complaining about it, what exactly would those "additional issues" be?

tobifjellner commented 1 year ago

Warnings in the translation UI, for instance

akirk commented 1 year ago

@tobifjellner's point here is that the English string doesn't contain a numeric placeholder. In that case _n() should not be used when it's just for differentiating between singular and plural.

The concept of singular and plural is a human one, it is independent of the language. In some languages the numeral and the object need to be grammatically aligned differently than in English. (There is also a concept of grammatical gender for which unfortunately gettext doesn't have a representation and this also regularly creates different problems.)

Some languages like written Chinese don't have the syntax to differentiate between singular and plural, so when translating they only need to provide one translation. See https://translate.wordpress.org/projects/wp-plugins/jetpack-boost/stable/zh-tw/default/?filters[status]=either&filters[original_id]=12326913. For them, this discussion is irrelevant.

Other languages such as Russian, Ukranian, or Arabic have multiple singular+plurals depending on the specific number. For example, Ukranian has three: https://translate.wordpress.org/projects/wp-plugins/jetpack-boost/stable/uk/default/?filters[status]=either&filters[original_id]=12326913

ukranian-plurals

In this GIF you can see that the Ukrainian translator has specified the same translation for plural 2 and 3. But also note how the singular version applies not only to value 1 but also to values 21 and 31.

Let's look at a couple of scenarios: Number English Ukranian (written in English )
0 These pages have timed out: These pages have timed out:
1 This page has timed out: This page has timed out:
2 These pages have timed out: These pages have timed out:
5 These pages have timed out: These pages have timed out:
21 These pages have timed out: This page has timed out:

So in this scenario, if 21 pages timed out, it would show the translation for 1 page timed out. When it'd say (deliberately using an English singular here) "21 page timed out," it'd be clear. But not if it says "page timed out."

This particular string is probably a bad example because the severity of showing a singular instead of a plural is not as bad but there might be other scenarios where it matters more.

So, for this reason, if there is not a number to display inside the message, then it should be done like @tobifjellner said: check for 1 == n, and use __( 'singular message' ) resp. __( 'plural message' ).

For the other issue, it is acceptable to split translatable strings into sentences and concat them with an interspersed space character ' ', so you can avoid to have the same partial translations for different plurals.

We'll look at updating https://developer.wordpress.org/plugins/internationalization/how-to-internationalize-your-plugin/ and I wonder if we could also get this baked into the WordPress coding standards.

anomiex commented 1 year ago

Hi @akirk. Can you address the actual situation I've described here, instead of writing a generic reply that merely reiterates why _n() exists in the first place (I already understand that) and then immediately recommending doing something that goes against that reason because you're considering only a single message in isolation?

anomiex commented 1 year ago

@akirk and I had a discussion via private messages. It seems that (using non-grammatical English to represent grammatical Ukrainian) someone would indeed write sentences like these:

OTOH, we weren't clear on whether any other language might indeed say "This 21 page. Fix this page."

I had been assuming that the gettext infrastructure would be able to handle this sort of thing, but at least for Ukrainian it can't handle choosing between "Fix this page" for the 1 case and "Fix these pages" for the 21 case. I'm not entirely convinced that doing a n == 1 check would be correct for every other language, but it may be the best we can do with the existing infrastructure.

@akirk said he'd look into all this in different languages in more detail before making a final recommendation.

tobifjellner commented 1 year ago

If you in "Ukrainian" would like to say "There is a problem with 21 page. Fix these pages." then there is only one way to do it. You would need to split this into two parts, where the first sentence is handled by _n() and in the second part a test on 1 == n to split between to simple strings "Fix this page."//"Fix these pages."

thingalon commented 1 year ago

As someone who is not a linguist, and only speaks one language - do we have a decision on this? Is __n() ok for singular/plural?

anomiex commented 1 year ago

From the above discussion, at least for Ukranian _n() doesn't do the right thing when you don't include the number in the text. Specifically, when the number is included then cases for 1 and 21 are the same, while they're different when he number is not included and gettext does not allow for separating them in that situation. So _n() should not be used for messages that don't include the number.

It's not clear that the recommended n === 1 ? __( 'generic singular' ) : __( 'generic plural' ) is going to do the right thing in every language either, but we don't have a ready counterexample. It may be the best we can do within the limitations of gettext.