Open exterkamp opened 4 years ago
I'm not sure I understand this one @exterkamp, could you explain a tad more? Isn't it the role of the translators to author the appropriate translated plurals and we're only writing the english ones?
Based on my understanding, 1 element found
would never use a number other than 1...
EDIT: I'm also not clear what were we supposed to write instead :)
ar
uses the eastern arabic numeral set: Eastern Arabic ٠ | ١ | ٢ | ٣ | ٤ | ٥ | ٦ | ٧ | ٨ | ٩ | ١٠
Notice this is "4 elements found", but uses a "٤" instead of "4":
Now look at how it handles "1 element found":
It should say "١ element found"; however the ICU plural for =1
is the literal "1 element found". And I don't think the translators will use eastern arabic numerals consistently with what we do (we always use them). Additionally I doubt they will use thai or traditional Chinese numerics as well if we ever choose to enforce those characters: https://github.com/GoogleChrome/lighthouse/tree/i18n-numbers
I think we will need to use the replacement for =1
as well. Like this:
displayValue: `{nodeCount, plural,
=0 {No elements found}
=1 {# element found}
other {# elements found}
}`,
That said, I don't think this is high priority, since our numeric i18n story is all over the place...
Notice this is "4 elements found", but uses a "٤" instead of "4": Now look at how it handles "1 element found":
I got that part, what I was asking is why does it matter? For ar
we won't be using the message 1 element found
we'll be using something else written by the translator that's written in that locale and that message should use the correct numeral or #
in place of it, right?
I don't think the translators will use eastern arabic numerals consistently
I think this answers my real question, but let me double check. I am understanding this correctly that the concern is translators will use whatever numeral we write in the english string and not use whatever numeral is appropriate for the locale they are translating to and/or appropriating substitute #
for it?
translators will use whatever numeral we write in the english string and not use whatever numeral is appropriate for the locale they are translating to and/or appropriating substitute # for it
Correct. In the same way that our markdown was getting ruined by some translators replacing " ` " with " ' ". Translating is not as consistent as we want it to be. I think if we try to become more consistent with numerals of other languages, then we should probably look into this as an issue.
Is it an issue now? meh Is it something that we should handle in the future? probably
can we add a test asserting there is no:
maybe not an assert, since we couldn't easily fix it (unless you can just edit it in TC?) maybe just autofix and warn
can we add a test asserting there is no:
- `
- [
- ]
- (
- )
maybe not an assert, since we couldn't easily fix it (unless you can just edit it in TC?) maybe just autofix and warn
This seems unrelated. Is there a problem with markdown now? I was talking prior to the ICU escaping & CTC files.
this is what we have today
in general we need to fix numeral i18n
ok well the numerics are translated properly for other languages in the metrics table.
arabic isn't getting localized numbers because intl-messageformat
doesn't support it i guess.
We are on an old version of that package, perhaps upgrading is in order.
anyway if we want arabic numerals we should use a different locale code. ar-u-nu-arab
see this
not here tho
can get arabic numerals from CLI with no code changes:
yarn start https://www.example.com/ -A --view --locale=ar-u-nu-arab
(but not the gauge..)
arabic isn't getting localized numbers because intl-messageformat doesn't support it i guess.
FWIW, this comes from the browser's formatting which intl-messageformat
uses. e.g. (50).toLocaleString('ar')
gives '50'
while (50).toLocaleString('ar-EG')
gives '٥٠'.
Reading about this today, I believe this may be intentional. Basically there are multiple types of digits depending on the region (ar-eg, ar-tn, etc), and the decision was made that ar
(aka ar-001
or world-wide ar
) would use ascii/latin digits as it would be most widely understood, even if it was likely not the user's usual choice of digit.
This CLDR thread gives some background, though I haven't found the follow up yet that moved it from testing to default.
I agree, it seems the expectation is that "ar" alone is latin numerals. LH is doing things correctly here already.
Some of this code was recently modified via #13830
We should make sure we are using the number formatter for all numbers in the report. Some places we're (probably) missing today: the gauge, the score legend. stuff in the meta block / tooltips
The titles for the new diagnostics (although I predict any ICU plural we use with an explicit "1") are not rendered properly with non-standard numeric representations.
This is because the ICU decleration for the "1" case is explicitly an
en
"1".