elixir-cldr / cldr_units

Unit formatting (volume, area, length, ...) functions for the Common Locale Data Repository (CLDR)
Other
16 stars 13 forks source link

Number Spellout Wrong #23

Closed maennchen closed 2 years ago

maennchen commented 2 years ago

Steps to reproduce

AcmeCldr.Unit.to_string!(Cldr.Unit.new!(1, :year), format: :spellout, locale: "de")

Expected

ein Jahr

Actual

eins Jahr

Versions

$ mix deps | grep cldr
* cldr_utils 2.16.0 (Hex package) (mix)
  locked at 2.16.0 (cldr_utils) 3ef5dc0f
* ex_cldr 2.23.1 (Hex package) (mix)
  locked at 2.23.1 (ex_cldr) f7b42cf2
* ex_cldr_calendars 1.16.0 (Hex package) (mix)
  locked at 1.16.0 (ex_cldr_calendars) 483d91a0
* ex_cldr_currencies 2.11.1 (Hex package) (mix)
  locked at 2.11.1 (ex_cldr_currencies) 99e8eb3f
* ex_cldr_dates_times 2.9.2 (Hex package) (mix)
  locked at 2.9.2 (ex_cldr_dates_times) dabd8e6f
* ex_cldr_languages 0.2.2 (Hex package) (mix)
  locked at 0.2.2 (ex_cldr_languages) d9cbf4bf
* ex_cldr_lists 2.8.0 (Hex package) (mix)
  locked at 2.8.0 (ex_cldr_lists) 455406d4
* ex_cldr_numbers 2.22.0 (Hex package) (mix)
  locked at 2.22.0 (ex_cldr_numbers) af8e7267
* ex_cldr_units 3.7.1 (Hex package) (mix)
  locked at 3.7.1 (ex_cldr_units) b9595bea
* hygeia_cldr 0.1.0 (apps/hygeia_cldr) (mix)
kipcole9 commented 2 years ago

Unfortunately there is no data in CLDR to identify the gender of nouns (and I'm sure its out of scope given the magnitude of that task!). However it does provide gender-specific rules in rules based formatting (which is what is used to format spelling amongst other things).

Spellout formatting for nouns of different gender

If you know in advance the gender of the noun, you can apply it during the formatting process. For example:

iex> MyApp.Cldr.Unit.to_string!(Cldr.Unit.new!(1, :year), format: :spellout_cardinal_neuter, locale: "de")                
"ein Jahr"
iex> MyApp.Cldr.Unit.to_string!(Cldr.Unit.new!(1, :year), format: :spellout_cardinal_feminine, locale: "de")
"eine Jahr"
iex> MyApp.Cldr.Unit.to_string!(Cldr.Unit.new!(1, :year), format: :spellout_cardinal_masculine, locale: "de")
"ein Jahr"

Default :spellout rule

When using the format :spellout it is actually invoking the rule :spellout_numbering which is intended to format standalone numbers, which accounts for the result you are seeing:

iex> MyApp.Cldr.Unit.to_string!(Cldr.Unit.new!(1, :year), format: :spellout_numbering, locale: "de")
"eins Jahr"

CLDR grammatical case and grammatical gender

CLDR is adding grammatical_case and gender to the data for units, and in fact has done some limited work for the current CLDR39 specifically for the de and a few others locale and the upcoming CLDR40 adds support for an additional 29 locales.

ex_cldr does support this data through the :grammatical_case and :grammatical_gender but its not specifically aimed at this use case. Anyhow, currently only :masculine is supported:

iex> MyApp.Cldr.Unit.to_string!(Cldr.Unit.new!(1, :year), format: :spellout_numbering, locale: "de", grammatical_gender: :neuter)
** (Cldr.UnknownGrammaticalGenderError) The locale "de" does not define a grammatical gender :neuter. The valid genders are [:masculine]
    (ex_cldr_units 3.7.1) lib/cldr/unit/format.ex:283: Cldr.Unit.Format.to_string!/3

What rules are available in a locale?

Since all rules get compiled as functions on a backend module, they can be found using tab-completion in iex. For example:

# These are the rule categories
iex> MyApp.Cldr.Rbnf.
NumberSystem    Ordinal         Spellout        

# These are the rules for Spellout. Not all of them
# may be available for all locales.
iex>  MyApp.Cldr.Rbnf.Spellout.all_rule_sets                                                                                
 [:spellout_cardinal, :spellout_cardinal_feminine,
 :spellout_cardinal_feminine_standalone, :spellout_cardinal_m,
 :spellout_cardinal_masculine, :spellout_cardinal_masculine_standalone,
 :spellout_cardinal_n, :spellout_cardinal_neuter, :spellout_cardinal_r,
 :spellout_cardinal_s, :spellout_cardinal_verbose, :spellout_construct_feminine,
 :spellout_construct_masculine, :spellout_numbering,
 :spellout_numbering_verbose, :spellout_numbering_year, :spellout_ordinal,
 :spellout_ordinal_feminine, :spellout_ordinal_feminine_plural,
 :spellout_ordinal_m, :spellout_ordinal_masculine,
 :spellout_ordinal_masculine_plural, :spellout_ordinal_n, :spellout_ordinal_r,
 :spellout_ordinal_s, :spellout_ordinal_verbose]
maennchen commented 2 years ago

@kipcole9 I assumed that gender would be a problem. This is only a nice to have for me and not really a requirement.

Thanks for looking into it :heart:

Should I close the issue or would you like to keep it open until CLDR40 is around?

kipcole9 commented 2 years ago

I'll close the issue since whatever CLDR40 data can allow I will implement. Hopefully you've got enough tooling in ex_cldr to at least produce grammatically correct results in this situation.