GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.25k stars 9.35k forks source link

i18n: use locale for all numeric formatting in the report #10786

Open exterkamp opened 4 years ago

exterkamp commented 4 years ago

The titles for the new diagnostics (although I predict any ICU plural we use with an explicit "1") are not rendered properly with non-standard numeric representations. image

This is because the ICU decleration for the "1" case is explicitly an en "1".

  displayValue: `{nodeCount, plural,
    =0 {No elements found}
    =1 {1 element found} // this uses the literal "1"
    other {# elements found} // this uses the number, like the character "١"
    }`,
patrickhulce commented 4 years ago

I'm not sure I understand this one @exterkamp, could you explain a tad more? Isn't it the role of the translators to author the appropriate translated plurals and we're only writing the english ones?

Based on my understanding, 1 element found would never use a number other than 1...

EDIT: I'm also not clear what were we supposed to write instead :)

exterkamp commented 4 years ago

ar uses the eastern arabic numeral set: Eastern Arabic ٠ | ١ | ٢ | ٣ | ٤ | ٥ | ٦ | ٧ | ٨ | ٩ | ١٠ Notice this is "4 elements found", but uses a "٤" instead of "4": image Now look at how it handles "1 element found": image

It should say "١ element found"; however the ICU plural for =1 is the literal "1 element found". And I don't think the translators will use eastern arabic numerals consistently with what we do (we always use them). Additionally I doubt they will use thai or traditional Chinese numerics as well if we ever choose to enforce those characters: https://github.com/GoogleChrome/lighthouse/tree/i18n-numbers image image image

I think we will need to use the replacement for =1 as well. Like this:

  displayValue: `{nodeCount, plural,
    =0 {No elements found}
    =1 {# element found}
    other {# elements found}
    }`,

That said, I don't think this is high priority, since our numeric i18n story is all over the place... image

patrickhulce commented 4 years ago

Notice this is "4 elements found", but uses a "٤" instead of "4": Now look at how it handles "1 element found":

I got that part, what I was asking is why does it matter? For ar we won't be using the message 1 element found we'll be using something else written by the translator that's written in that locale and that message should use the correct numeral or # in place of it, right?

I don't think the translators will use eastern arabic numerals consistently

I think this answers my real question, but let me double check. I am understanding this correctly that the concern is translators will use whatever numeral we write in the english string and not use whatever numeral is appropriate for the locale they are translating to and/or appropriating substitute # for it?

exterkamp commented 4 years ago

translators will use whatever numeral we write in the english string and not use whatever numeral is appropriate for the locale they are translating to and/or appropriating substitute # for it

Correct. In the same way that our markdown was getting ruined by some translators replacing " ` " with " ' ". Translating is not as consistent as we want it to be. I think if we try to become more consistent with numerals of other languages, then we should probably look into this as an issue.

Is it an issue now? meh Is it something that we should handle in the future? probably

connorjclark commented 4 years ago

can we add a test asserting there is no:

maybe not an assert, since we couldn't easily fix it (unless you can just edit it in TC?) maybe just autofix and warn

exterkamp commented 4 years ago

can we add a test asserting there is no:

  • `
  • [
  • ]
  • (
  • )

maybe not an assert, since we couldn't easily fix it (unless you can just edit it in TC?) maybe just autofix and warn

This seems unrelated. Is there a problem with markdown now? I was talking prior to the ICU escaping & CTC files.

connorjclark commented 2 years ago

this is what we have today

image

in general we need to fix numeral i18n

connorjclark commented 2 years ago

ok well the numerics are translated properly for other languages in the metrics table.

image

arabic isn't getting localized numbers because intl-messageformat doesn't support it i guess.

We are on an old version of that package, perhaps upgrading is in order.

anyway if we want arabic numerals we should use a different locale code. ar-u-nu-arab see this

not here tho

image
connorjclark commented 2 years ago

can get arabic numerals from CLI with no code changes:

yarn start https://www.example.com/ -A --view --locale=ar-u-nu-arab
image

(but not the gauge..)

brendankenny commented 2 years ago

arabic isn't getting localized numbers because intl-messageformat doesn't support it i guess.

FWIW, this comes from the browser's formatting which intl-messageformat uses. e.g. (50).toLocaleString('ar') gives '50' while (50).toLocaleString('ar-EG') gives '٥٠'.

Reading about this today, I believe this may be intentional. Basically there are multiple types of digits depending on the region (ar-eg, ar-tn, etc), and the decision was made that ar (aka ar-001 or world-wide ar) would use ascii/latin digits as it would be most widely understood, even if it was likely not the user's usual choice of digit.

This CLDR thread gives some background, though I haven't found the follow up yet that moved it from testing to default.

connorjclark commented 2 years ago

I agree, it seems the expectation is that "ar" alone is latin numerals. LH is doing things correctly here already.

connorjclark commented 2 years ago

Some of this code was recently modified via #13830

We should make sure we are using the number formatter for all numbers in the report. Some places we're (probably) missing today: the gauge, the score legend. stuff in the meta block / tooltips