elixir-cldr / cldr_numbers

CLDR Number localisation and formatting
Other
41 stars 21 forks source link

Adding support for spelling out larger numbers? #46

Closed mayel closed 8 months ago

mayel commented 8 months ago

I understand the data is coming from elsewhere, so how would one go about adding some of these? (since currently it doesn't go higher than quadrillion, which my 4 year old considers a pretty small number 😉)

kipcole9 commented 8 months ago

Spelling out numbers is driven by rules based number formatting. In ex_cldr_numbers these rules are converted to elixir code which is then compiled. So we end up with a CLDR compliant implementation with good performance. But it does mean that if the rules don't support certain number ranges or types, then I don't think there is a really good solution.

I don't even know how you spell numbers beyond billions, so your 4-year-old is well ahead of me!

Anyway, it wouldn't actually be too hard to add additional scale - at least at an experimental level on a single locale. Remember CLDR is about localisation so I would be uncomfortable with an official release unless the CLDR data itself was updated. And apart from getting agreement from the CLDR team (which isn't by any means easy) you'd then need contributors to provide localised names for over 600 locales.

How to add rules manually

If you're doing this for an experiment, then I would manually edit the locale you care about the most and add some rules to the right rule group and then force recompilation with mix deps.compile ex_cldr_numbers --force.

Where to add the rule

What you would be doing is adding a new rule to the "spellout_cardinal" ruleset of the en.json locale. Which will be found in deps/ex_cldr/priv/cldr/locales/en.json file. You'll need to format the json in order to make sense of it. Let me know if you have issues finding the data.

Quintillion rule format

The following is a copy of the rule for "quadrillion" with three more zeros to each of the range, base_value and divisor fields. Also adding a new name for the range. Note that order of the rules is important so quintillion must come immediately after quadrillion.

          {
            "radix": 10,
            "range": 1000000000000000000000,
            "base_value": 1000000000000000000,
            "definition": "←← quintillion[ →→]",
            "divisor": 1000000000000000000
          },

Let me know if you have any other questions.

mayel commented 8 months ago

Thanks once again for your legendary speed and attention to detail 😃 Will give that a try!

mayel commented 8 months ago

Thanks again, I was able to do as you suggested and format some pretty large numbers: https://gist.github.com/mayel/acca08724f4f91c9ed2a6834e174e10d (with this code)

I used this code to generate the necessary JSON, but interestingly ran into a Jason parsing error (I removed the entry with that number and the few larger ones to get it to work for now):

== Compilation error in file lib/cldr.ex ==
** (Jason.DecodeError) unexpected sequence at position 973280: "1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
    (jason 1.4.1) lib/jason.ex:92: Jason.decode!/2
    (ex_cldr 2.37.5) lib/cldr/install.ex:190: Cldr.Install.locale_stale?/2
    (ex_cldr 2.37.5) lib/cldr/install.ex:68: Cldr.Install.install_locale_name/3
    (elixir 1.16.0) lib/enum.ex:987: Enum."-each/2-lists^foreach/1-0-"/2
    (ex_cldr 2.37.5) lib/cldr/install.ex:29: Cldr.Install.install_known_locale_names/1
    (ex_cldr 2.37.5) lib/cldr.ex:102: Cldr.install_locales/1
    (ex_cldr 2.37.5) expanding macro: Cldr.Backend.Compiler.__before_compile__/1
    lib/cldr.ex:1: M.Cldr (module)
kipcole9 commented 8 months ago

one hundred septillion nonagintillion ducentillion (902 zeros)

Wow! I hope you son agrees thats a "fairly" big number! For some reason I really enjoy these kinds of experiments. I'm happy that the CLDR data scales the algorithm well (that team does some amazing work with "human" data - even if it makes my head hurt sometimes).

Looks like Jason.decode/1 won't work for tokens > 1024 bytes (or chars - not sure which) in length:

# 1024 in total length
iex> Jason.decode "1" <> String.duplicate("0", 1023)
{:ok,
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000}

# 1025 in total length
iex> Jason.decode "1" <> String.duplicate("0", 1024)
{:error,
 %Jason.DecodeError{
   position: 0,
   token: "10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
   data: "10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
 }}

I'll file an upstream bug report.

kipcole9 commented 8 months ago

Ah, this commit limits the number of digits in an integer. But it can be configured with the decoding_integer_digit_limit app env setting. So adding something like:

config :jason, 
  decoding_integer_digit_limit: 2048

to config.exs will allow you to parse larger numbers. It needs to be a compile time configuration I believe.

mayel commented 8 months ago

That did it :) I now have a calculator able to display numbers erlang doesn't:

iex(22)> calc "890 ^ 70"

2.8661602272158866e206

two hundred eighty-six sextillion six hundred sixteen quintillion twenty-two quadrillion seven hundred twenty-one trillion five hundred eighty-eight billion six hundred fifty-eight million twenty-five thousand three hundred forty-three sexagintillion four hundred seventy-one octillion four hundred fifty-two septillion five hundred ninety-five sextillion one hundred seventy-seven quintillion eight hundred twenty-two quadrillion seven hundred eight trillion eight hundred ninety billion six hundred twenty-eight million one hundred twenty-five thousand nine hundred ninety-five quinquagintillion seven hundred eighty-three octillion three hundred four septillion nine hundred ninety-one sextillion six hundred sixty-eight quintillion eighty-one quadrillion six hundred seven trillion four hundred forty-six billion seven hundred eighty-three million eighty-seven thousand nine hundred seven quadragintillion nine hundred forty-one noventrigintillion eight hundred twenty-two octotrigintillion one hundred forty-three septentrigintillion seven hundred seventy-two sestrigintillion three hundred sixty-four quintrigintillion three hundred fifty-five quattuortrigintillion seven hundred forty-five trestrigintillion fifty-seven googol two duotrigintillion nine hundred sixty-one untrigintillion nine hundred thirty-five trigintillion five hundred seventy-eight novemvigintillion five hundred thirty-five octovigintillion four hundred seventy-eight septemvigintillion ten sesvigintillion six hundred twenty-six quinvigintillion nine hundred fifty-six quattuorvigintillion eight tresvigintillion two hundred fifty-five duovigintillion one hundred twenty-nine unvigintillion seven hundred ten vigintillion seven hundred fifty-eight novendecillion eighteen octodecillion nine hundred sixty-seven septendecillion six hundred ninety-four sedecillion seven hundred forty-two quindecillion six quattuordecillion seventy-nine tredecillion nine hundred forty-seven duodecillion two hundred eighty undecillion three hundred seventy-one decillion four hundred twenty-one nonillion one hundred eighty-six octillion one hundred fifty-one septillion four hundred thirty-nine sextillion three hundred eighty-eight quintillion six hundred thirty-nine quadrillion four hundred fifty-two trillion fifty-five billion eight hundred seventy million six hundred thirty-six thousand thirty-two

Sample repo for the curious: https://github.com/mayel/large_numbers_ex

kipcole9 commented 8 months ago

That's .... quite large. And a fun exercise. Thanks for sharing.