elixir-cldr / cldr_numbers

CLDR Number localisation and formatting
Other
41 stars 23 forks source link

Singular / Plural mistakes on Cldr.Number format: :long #16

Closed nicolasblanco closed 3 years ago

nicolasblanco commented 3 years ago

Hello,

I'm simply an end-developer and I'm not really used of the internals of the library.

I've just noticed that most plural/singular forms are mistaken, both in "en" and "fr" locale.

iex(14) ▶ Moning.Cldr.Number.to_string 124400088000, format: :long, locale: "en"
{:ok, "124 billion"}
iex(15) ▶ Moning.Cldr.Number.to_string 194400088000, format: :long, locale: "en"
{:ok, "194 billion"}
iex(16) ▶ Moning.Cldr.Number.to_string 2000, format: :long, locale: "en"
{:ok, "2 thousand"}
iex(17) ▶ Moning.Cldr.Number.to_string 2000, format: :long, locale: "fr"
{:ok, "2 millier"}

Do you have any ideas why all those results are wrong?

kipcole9 commented 3 years ago

@nicolasblanco, not wrong - perhaps just surprising. The number formats defined in CLDR deliberately format this way for what is described in CLDR as decimal_long (and which I call just :long format).

Happy to help find a way to format the numbers in the way you're after if you can describe it for me.

In case you're interested, the underlying data that drives these formats is:

iex> MyApp.Cldr.Number.Format.formats_for!("en") |> Map.get(:decimal_long)
[
  [1000, %{one: ["0 thousand", 1], other: ["0 thousand", 1]}],
  [10000, %{one: ["00 thousand", 2], other: ["00 thousand", 2]}],
  [100000, %{one: ["000 thousand", 3], other: ["000 thousand", 3]}],
  [1000000, %{one: ["0 million", 1], other: ["0 million", 1]}], 
  [10000000, %{one: ["00 million", 2], other: ["00 million", 2]}],
  [100000000, %{one: ["000 million", 3], other: ["000 million", 3]}],
  [1000000000, %{one: ["0 billion", 1], other: ["0 billion", 1]}],
  [10000000000, %{one: ["00 billion", 2], other: ["00 billion", 2]}],
  [100000000000, %{one: ["000 billion", 3], other: ["000 billion", 3]}],
  [1000000000000, %{one: ["0 trillion", 1], other: ["0 trillion", 1]}],
  [10000000000000, %{one: ["00 trillion", 2], other: ["00 trillion", 2]}],
  [100000000000000, %{one: ["000 trillion", 3], other: ["000 trillion", 3]}]
]
iex> MyApp.Cldr.Number.Format.formats_for!("fr") |> Map.get(:decimal_long)
[
  [
    1000,
    %{:one => ["0 millier", 1], :other => ["0 mille", 1], "1" => ["mille", 0]}
  ],
  [10000, %{one: ["00 mille", 2], other: ["00 mille", 2]}],
  [100000, %{one: ["000 mille", 3], other: ["000 mille", 3]}],
  [1000000, %{one: ["0 million", 1], other: ["0 millions", 1]}],
  [10000000, %{one: ["00 million", 2], other: ["00 millions", 2]}],
  [100000000, %{one: ["000 million", 3], other: ["000 millions", 3]}],
  [1000000000, %{one: ["0 milliard", 1], other: ["0 milliards", 1]}],
  [10000000000, %{one: ["00 milliard", 2], other: ["00 milliards", 2]}],
  [100000000000, %{one: ["000 milliard", 3], other: ["000 milliards", 3]}],
  [1000000000000, %{one: ["0 billion", 1], other: ["0 billions", 1]}],
  [10000000000000, %{one: ["00 billion", 2], other: ["00 billions", 2]}],
  [100000000000000, %{one: ["000 billion", 3], other: ["000 billions", 3]}]
]

Using the example [10000, %{one: ["00 mille", 2], other: ["00 mille", 2]}], is means if the number is less than 10_000 then format it as a number with 2 significant digits with the suffix depending on the plural rule for the number and locale.

Let me know what you're trying to format?

nicolasblanco commented 3 years ago

Thanks very much @kipcole9 for your fast answer!

I'm not English-native... Let's just look at this rule in "fr" locale:

  [
    1000,
    %{:one => ["0 millier", 1], :other => ["0 mille", 1], "1" => ["mille", 0]}
  ]

Then I'm doing:

Moning.Cldr.Number.to_string 2000, format: :long, locale: "fr"
# => {:ok, "2 millier"}

What I find strange is that 2 millier makes no sense at all for a French writer, it's plural, so it should be 2 milliers but also most people would write 2 milles. millier is reserved for unit 1:

1 millier, 2 milles, 42 milles, etc.

kipcole9 commented 3 years ago

Thats very interesting. I'm not a French speaker but I do recognise it would be 2 milles and yet the data would not appear to support that.

I do most definitely have a bug for the case of 1000 which by the rule above should produce mille (not 1 milliier nor 1 mille. I will fix that this weekend.

It looks like the source data is incorrect. The original xml has:

<pattern type="1000" count="1">mille</pattern>
<pattern type="1000" count="one">0 millier</pattern>
<pattern type="1000" count="other">0 mille</pattern>

Before I file a bug on the CLDR project, may I confirm your expectations? Because even with my bug fixed and the CLDR data fixed I think the results would be:

mille, 2 milles, 42 milles

Since the rule for 1 would override the rule for :one (the plural category).

nicolasblanco commented 3 years ago

Hello again @kipcole9 and thank you very much for the support!

I'm trying to understand the rules in the source data XML and the lines you highlighted.

Basically, the rule should be:

millier is used with the number 1 on front and strictly only for 1000 (1 millier). mille is used for anything between 1001 (Mille 1) and 1999 (Mille 999) and with no number before the word mille. milles is used for many as the plural form : 3 milles, 4 milles, etc.

Now about the en locale...

Moning.Cldr.Number.to_string 994400088000, format: :long, locale: "en"
{:ok, "994 billion"}

Shouldn't it be the plural form : 994 billions ?

Moning.Cldr.Number.to_string 9900000000, format: :long, locale: "en"
{:ok, "0 billion"}

0 billion for 9900000000 seems a bit strange to display to the end user also.

Thanks for the work and this library again.

kipcole9 commented 3 years ago

Thanks for your patience while I have been digging into this.

I will get the bugs in my code squashed this weekend (one down, one to go) and I will file an issue with CLDR regarding the data issues.

kipcole9 commented 3 years ago

Commit b18f9da fixes formatting when the format string is only digits. This corrects the error from point 3 above. The examples now format correctly:

iex> Cldr.Number.to_string 9_900_000_000, format: :long, locale: "fr"      
{:ok, "10 milliards"}
iex> Cldr.Number.to_string 9_900_000_000, format: :long, locale: "en"
{:ok, "10 billion"}
nicolasblanco commented 3 years ago

Thank you @kipcole9 for the changes !

I'm trying on master and it has improved the situation 🙌🏻 .

I still have this weird behaviour. When the library is rounding the number, it's sometimes not displaying the same result where it should normally 🤔 ...

▶ Moning.Cldr.Number.to_string 499_999_000, format: :long, locale: "fr"
{:ok, "500 millions"}  # => this is the correct form

▶ Moning.Cldr.Number.to_string 500_000_000, format: :long, locale: "fr"
{:ok, "500 million"} # => this is not correct, should be the same as before

▶ Moning.Cldr.Number.to_string 9_900_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"} # => this is the correct form

▶ Moning.Cldr.Number.to_string 10_000_000_000, format: :long, locale: "fr"
{:ok, "10 milliard"} # => this is not correct, should be the same as before
kipcole9 commented 3 years ago

Thanks for the additional test cases. I am seeing the same issue and its related to the order in which number modulo, pluralisation and rounding are done - meaning I need more coffee before I tackle it. Not too far away now (except for the data error in CLDR).

The issue is also exhibited for 1_000 and 1_001 for example.

kipcole9 commented 3 years ago

Apologies for the delay. Commit 798494 fixes most of the outstanding issues I think. There remains the issue of mille and millier that I will work on next. This is related to "exact match of 1_000" which I am not currently handling correctly.

kipcole9 commented 3 years ago

As of commit 190a0b on ex_cldr I believe that the long formats are being correctly executed according to the rules. The outstanding rule was for the numbers such as 1_000 and 1_001.

The rule that drives these formats in locale fr is:

  [1000, %{1 => ["mille", 0], :one => ["0 millier", 1], :other => ["0 mille", 1]}]

Which says that for the number 1 the result should be mille. For the number 1_001 it should be 1 millier and for 2_000 is should be 2 mille (this last appears to be in error at the CLDR data level as you have pointed out - it should be 2 milles).

Per the below I think the rule is now being correctly executed:

iex> Cldr.Number.to_string 1_000, format: :long, locale: "fr"                          
{:ok, "mille"}
iex> Cldr.Number.to_string 1_001, format: :long, locale: "fr"                          
{:ok, "1 millier"}
iex> Cldr.Number.to_string 2_000, format: :long, locale: "fr"
{:ok, "2 mille"}
iex> Cldr.Number.to_string 2_001, format: :long, locale: "fr"
{:ok, "2 mille"}

Comments on (a) rule compliance and (b) language correctness are definitely welcome. If this is now rule compliant I will release new versions of ex_cldr and ex_cldr_numbers.

kipcole9 commented 3 years ago

I googled "2 mille ou 2 milles" and the following links seemed relevant but as a non-French-speaker I'm not comfortable interpreting them. Do they add anything to this conversation?

nicolasblanco commented 3 years ago

Hello @kipcole9 !

That's great, I've tried on master, and all the previous cases look good now:

iex(11) ▶ Moning.Cldr.Number.to_string 499_999_000, format: :long, locale: "fr"
{:ok, "500 millions"}

iex(12) ▶ Moning.Cldr.Number.to_string 500_000_000, format: :long, locale: "fr"
{:ok, "500 millions"}

iex(13) ▶ Moning.Cldr.Number.to_string 9_900_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"}

iex(14) ▶ Moning.Cldr.Number.to_string 10_000_000_000, format: :long, locale: "fr"
{:ok, "10 milliards"}

iex(15) ▶ Moning.Cldr.Number.to_string 3000, format: :long, locale: "fr"
{:ok, "3 mille"}

iex(16) ▶ Moning.Cldr.Number.to_string 3999, format: :long, locale: "fr"
{:ok, "4 mille"}

🙌🏻

About the cases for mille (thousand), I have checked on multiple sources and dictionaries and can confirm : the word is strictly unvarying. So it was a small mistake on my side for this one 😅: 3 mille and 4 mille are perfectly fine and correct.

There's a final comment from me... about the rule with millier. I personally think this word should be strictly used for 1000 and not for anything else. It makes no sense to use it just for 1001, 1001 is not a special case at all, I'm sure. But 1000 can be written millier or mille, I'm sure of that.

So probably the rule is to use millier for strictly 1000 and when rounding to use mille.

kipcole9 commented 3 years ago

@nicolasblanco thanks for your patience and collaboration. I'm at a point now where your understanding as a native French speaker is different to the CLDR data definitions. The data provided seems pretty clear that for exactly 1_000 the localisation should be mille (no number). For the numbers 1_001 to 1_499 the rules say 1 millier and after that its 2 mille etc. This means I still have a bug I need to resolve for 1_500..1_999 which I think by the rules should be 2 mille.

Clearly I'm not a linguist and the implementation is intended to follow the CLDR specification. At this stage, as best I can tell, the implementation does now follow the spec (one bug to squash first) so I will release soon a new version of ex_cldr and a new version of ex_cldr_numbers.

In english, the nearest understand I can come to of the definition of millier is about a thousand. But the examples quoted say un millier not 1 millier and as you note, there is no reference to 2 millier. I will send a message to the CLDR mailing list and see what comes back.

kipcole9 commented 3 years ago

As of commit a6a7c5 the pluralization appears correct (according to CLDR) for numbers in the range 1_001..1_499 and the range 1_500..1_999. I need to do some additional testing of some other work in ex_cldr before release but expect to do so before the end of the weekend.

Current pluralisation of the numbers around 1_000:

iex> Cldr.Number.to_string 1_000, format: :long, locale: "fr"
{:ok, "mille"}
iex> Cldr.Number.to_string 1_001, format: :long, locale: "fr"
{:ok, "1 millier"}
iex(3)> Cldr.Number.to_string 1_499, format: :long, locale: "fr"
{:ok, "1 millier"}
iex> Cldr.Number.to_string 1_500, format: :long, locale: "fr"
{:ok, "2 mille"}
iex> Cldr.Number.to_string 3_000, format: :long, locale: "fr"
{:ok, "3 mille"}
kipcole9 commented 3 years ago

I have published ex_cldr_numbers version 2.18.0 with the following changelog entry:

Bug Fixes

Enhancements

I will keep this issue open for a little bit longer in case your testing throws up another corner case. However I think that with your support and collaboration, the short and long number formats are being correctly processed (at least correct as far as the CLDR rules indicate).

nicolasblanco commented 3 years ago

@kipcole9 : thanks for your work on this issue!

kipcole9 commented 3 years ago

@nicolasblanco Thanks for your patience - and apologies for the inconvenience. This took far too long to fix. Please do keep the issues coming if you spot any more.

kipcole9 commented 3 years ago

@nicolasblanco You may be interested in Pluralization on compact numbers that is, in part, about this very topic and promises some improvements in future CLDR data releases.