Closed nicolasblanco closed 3 years ago
@nicolasblanco, not wrong - perhaps just surprising. The number formats defined in CLDR deliberately format this way for what is described in CLDR as decimal_long
(and which I call just :long
format).
Happy to help find a way to format the numbers in the way you're after if you can describe it for me.
In case you're interested, the underlying data that drives these formats is:
iex> MyApp.Cldr.Number.Format.formats_for!("en") |> Map.get(:decimal_long)
[
[1000, %{one: ["0 thousand", 1], other: ["0 thousand", 1]}],
[10000, %{one: ["00 thousand", 2], other: ["00 thousand", 2]}],
[100000, %{one: ["000 thousand", 3], other: ["000 thousand", 3]}],
[1000000, %{one: ["0 million", 1], other: ["0 million", 1]}],
[10000000, %{one: ["00 million", 2], other: ["00 million", 2]}],
[100000000, %{one: ["000 million", 3], other: ["000 million", 3]}],
[1000000000, %{one: ["0 billion", 1], other: ["0 billion", 1]}],
[10000000000, %{one: ["00 billion", 2], other: ["00 billion", 2]}],
[100000000000, %{one: ["000 billion", 3], other: ["000 billion", 3]}],
[1000000000000, %{one: ["0 trillion", 1], other: ["0 trillion", 1]}],
[10000000000000, %{one: ["00 trillion", 2], other: ["00 trillion", 2]}],
[100000000000000, %{one: ["000 trillion", 3], other: ["000 trillion", 3]}]
]
iex> MyApp.Cldr.Number.Format.formats_for!("fr") |> Map.get(:decimal_long)
[
[
1000,
%{:one => ["0 millier", 1], :other => ["0 mille", 1], "1" => ["mille", 0]}
],
[10000, %{one: ["00 mille", 2], other: ["00 mille", 2]}],
[100000, %{one: ["000 mille", 3], other: ["000 mille", 3]}],
[1000000, %{one: ["0 million", 1], other: ["0 millions", 1]}],
[10000000, %{one: ["00 million", 2], other: ["00 millions", 2]}],
[100000000, %{one: ["000 million", 3], other: ["000 millions", 3]}],
[1000000000, %{one: ["0 milliard", 1], other: ["0 milliards", 1]}],
[10000000000, %{one: ["00 milliard", 2], other: ["00 milliards", 2]}],
[100000000000, %{one: ["000 milliard", 3], other: ["000 milliards", 3]}],
[1000000000000, %{one: ["0 billion", 1], other: ["0 billions", 1]}],
[10000000000000, %{one: ["00 billion", 2], other: ["00 billions", 2]}],
[100000000000000, %{one: ["000 billion", 3], other: ["000 billions", 3]}]
]
Using the example [10000, %{one: ["00 mille", 2], other: ["00 mille", 2]}],
is means if the number is less than 10_000
then format it as a number with 2 significant digits with the suffix depending on the plural rule for the number and locale.
Let me know what you're trying to format?
Thanks very much @kipcole9 for your fast answer!
I'm not English-native... Let's just look at this rule in "fr" locale:
[
1000,
%{:one => ["0 millier", 1], :other => ["0 mille", 1], "1" => ["mille", 0]}
]
Then I'm doing:
Moning.Cldr.Number.to_string 2000, format: :long, locale: "fr"
# => {:ok, "2 millier"}
What I find strange is that 2 millier
makes no sense at all for a French writer, it's plural, so it should be 2 milliers
but also most people would write 2 milles
. millier
is reserved for unit 1:
1 millier, 2 milles, 42 milles, etc.
Thats very interesting. I'm not a French speaker but I do recognise it would be 2 milles
and yet the data would not appear to support that.
I do most definitely have a bug for the case of 1000
which by the rule above should produce mille
(not 1 milliier
nor 1 mille
. I will fix that this weekend.
It looks like the source data is incorrect. The original xml has:
<pattern type="1000" count="1">mille</pattern>
<pattern type="1000" count="one">0 millier</pattern>
<pattern type="1000" count="other">0 mille</pattern>
Before I file a bug on the CLDR project, may I confirm your expectations? Because even with my bug fixed and the CLDR data fixed I think the results would be:
mille, 2 milles, 42 milles
Since the rule for 1
would override the rule for :one
(the plural category).
Hello again @kipcole9 and thank you very much for the support!
I'm trying to understand the rules in the source data XML and the lines you highlighted.
Basically, the rule should be:
millier
is used with the number 1 on front and strictly only for 1000
(1 millier).
mille
is used for anything between 1001 (Mille 1
) and 1999 (Mille 999
) and with no number before the word mille
.
milles
is used for many as the plural form : 3 milles, 4 milles, etc.
Now about the en
locale...
Moning.Cldr.Number.to_string 994400088000, format: :long, locale: "en"
{:ok, "994 billion"}
Shouldn't it be the plural form : 994 billions ?
Moning.Cldr.Number.to_string 9900000000, format: :long, locale: "en"
{:ok, "0 billion"}
0 billion
for 9900000000
seems a bit strange to display to the end user also.
Thanks for the work and this library again.
Thanks for your patience while I have been digging into this.
For locale "en" its normal to not pluralise the label so "994 billion" is normal usage (at least as I understand as a native english speaker).
The issue with 1_000
in locale "fr" is a combination of two errors. The first is a bug in the pluralisation code which has been fixed in ex_cldr
and will require a new version of that library in the next few days. The second issue is the data issue where the pluralisation isn't correct (ie mille
instead of milles
).
There is also an issue, as you pointed out, with Cldr.Number.to_string 9900000000, format: :long, locale: "en"
. This resolves ultimately to Cldr.Number.to_string 9.9, format: "0 billion"
which is returning {:ok, "0 billion"}
when it should return {:ok, "10 billion"}
(in the locale "en").
I will get the bugs in my code squashed this weekend (one down, one to go) and I will file an issue with CLDR regarding the data issues.
Commit b18f9da fixes formatting when the format string is only digits. This corrects the error from point 3 above. The examples now format correctly:
iex> Cldr.Number.to_string 9_900_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"}
iex> Cldr.Number.to_string 9_900_000_000, format: :long, locale: "en"
{:ok, "10 billion"}
Thank you @kipcole9 for the changes !
I'm trying on master
and it has improved the situation 🙌🏻 .
I still have this weird behaviour. When the library is rounding the number, it's sometimes not displaying the same result where it should normally 🤔 ...
▶ Moning.Cldr.Number.to_string 499_999_000, format: :long, locale: "fr"
{:ok, "500 millions"} # => this is the correct form
▶ Moning.Cldr.Number.to_string 500_000_000, format: :long, locale: "fr"
{:ok, "500 million"} # => this is not correct, should be the same as before
▶ Moning.Cldr.Number.to_string 9_900_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"} # => this is the correct form
▶ Moning.Cldr.Number.to_string 10_000_000_000, format: :long, locale: "fr"
{:ok, "10 milliard"} # => this is not correct, should be the same as before
Thanks for the additional test cases. I am seeing the same issue and its related to the order in which number modulo, pluralisation and rounding are done - meaning I need more coffee before I tackle it. Not too far away now (except for the data error in CLDR).
The issue is also exhibited for 1_000
and 1_001
for example.
Apologies for the delay. Commit 798494 fixes most of the outstanding issues I think. There remains the issue of mille
and millier
that I will work on next. This is related to "exact match of 1_000" which I am not currently handling correctly.
As of commit 190a0b on ex_cldr
I believe that the long formats are being correctly executed according to the rules. The outstanding rule was for the numbers such as 1_000
and 1_001
.
The rule that drives these formats in locale fr
is:
[1000, %{1 => ["mille", 0], :one => ["0 millier", 1], :other => ["0 mille", 1]}]
Which says that for the number 1
the result should be mille
. For the number 1_001
it should be 1 millier
and for 2_000
is should be 2 mille
(this last appears to be in error at the CLDR data level as you have pointed out - it should be 2 milles
).
Per the below I think the rule is now being correctly executed:
iex> Cldr.Number.to_string 1_000, format: :long, locale: "fr"
{:ok, "mille"}
iex> Cldr.Number.to_string 1_001, format: :long, locale: "fr"
{:ok, "1 millier"}
iex> Cldr.Number.to_string 2_000, format: :long, locale: "fr"
{:ok, "2 mille"}
iex> Cldr.Number.to_string 2_001, format: :long, locale: "fr"
{:ok, "2 mille"}
Comments on (a) rule compliance and (b) language correctness are definitely welcome. If this is now rule compliant I will release new versions of ex_cldr
and ex_cldr_numbers
.
I googled "2 mille ou 2 milles" and the following links seemed relevant but as a non-French-speaker I'm not comfortable interpreting them. Do they add anything to this conversation?
Hello @kipcole9 !
That's great, I've tried on master
, and all the previous cases look good now:
iex(11) ▶ Moning.Cldr.Number.to_string 499_999_000, format: :long, locale: "fr"
{:ok, "500 millions"}
iex(12) ▶ Moning.Cldr.Number.to_string 500_000_000, format: :long, locale: "fr"
{:ok, "500 millions"}
iex(13) ▶ Moning.Cldr.Number.to_string 9_900_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"}
iex(14) ▶ Moning.Cldr.Number.to_string 10_000_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"}
iex(15) ▶ Moning.Cldr.Number.to_string 3000, format: :long, locale: "fr"
{:ok, "3 mille"}
iex(16) ▶ Moning.Cldr.Number.to_string 3999, format: :long, locale: "fr"
{:ok, "4 mille"}
🙌🏻
About the cases for mille
(thousand), I have checked on multiple sources and dictionaries and can confirm : the word is strictly unvarying.
So it was a small mistake on my side for this one 😅:
3 mille
and 4 mille
are perfectly fine and correct.
There's a final comment from me... about the rule with millier
. I personally think this word should be strictly used for 1000
and not for anything else. It makes no sense to use it just for 1001
, 1001
is not a special case at all, I'm sure. But 1000
can be written millier
or mille
, I'm sure of that.
So probably the rule is to use millier
for strictly 1000
and when rounding to use mille
.
@nicolasblanco thanks for your patience and collaboration. I'm at a point now where your understanding as a native French speaker is different to the CLDR data definitions. The data provided seems pretty clear that for exactly 1_000
the localisation should be mille
(no number). For the numbers 1_001
to 1_499
the rules say 1 millier
and after that its 2 mille
etc. This means I still have a bug I need to resolve for 1_500..1_999
which I think by the rules should be 2 mille
.
Clearly I'm not a linguist and the implementation is intended to follow the CLDR specification. At this stage, as best I can tell, the implementation does now follow the spec (one bug to squash first) so I will release soon a new version of ex_cldr
and a new version of ex_cldr_numbers
.
In english, the nearest understand I can come to of the definition of millier
is about a thousand. But the examples quoted say un millier
not 1 millier
and as you note, there is no reference to 2 millier
. I will send a message to the CLDR mailing list and see what comes back.
As of commit a6a7c5 the pluralization appears correct (according to CLDR) for numbers in the range 1_001..1_499
and the range 1_500..1_999
. I need to do some additional testing of some other work in ex_cldr
before release but expect to do so before the end of the weekend.
Current pluralisation of the numbers around 1_000
:
iex> Cldr.Number.to_string 1_000, format: :long, locale: "fr"
{:ok, "mille"}
iex> Cldr.Number.to_string 1_001, format: :long, locale: "fr"
{:ok, "1 millier"}
iex(3)> Cldr.Number.to_string 1_499, format: :long, locale: "fr"
{:ok, "1 millier"}
iex> Cldr.Number.to_string 1_500, format: :long, locale: "fr"
{:ok, "2 mille"}
iex> Cldr.Number.to_string 3_000, format: :long, locale: "fr"
{:ok, "3 mille"}
I have published ex_cldr_numbers version 2.18.0 with the following changelog entry:
Fixes short and long number formatting.
Fixes formatting whent the format string consists only of digits. Previously this would erroneously set both the maximum and minimum integer digits. Now it only sets the minimum integer digits.
:maximum_integer_digits
as an option to Cldr.Number.to_string/2
I will keep this issue open for a little bit longer in case your testing throws up another corner case. However I think that with your support and collaboration, the short and long number formats are being correctly processed (at least correct as far as the CLDR rules indicate).
@kipcole9 : thanks for your work on this issue!
@nicolasblanco Thanks for your patience - and apologies for the inconvenience. This took far too long to fix. Please do keep the issues coming if you spot any more.
@nicolasblanco You may be interested in Pluralization on compact numbers that is, in part, about this very topic and promises some improvements in future CLDR data releases.
Hello,
I'm simply an end-developer and I'm not really used of the internals of the library.
I've just noticed that most plural/singular forms are mistaken, both in "en" and "fr" locale.
Do you have any ideas why all those results are wrong?