elixir-cldr / cldr_dates_times

Date & times formatting functions for the Common Locale Data Repository (CLDR) package https://github.com/elixir-cldr/cldr
Other
69 stars 13 forks source link

Regional variations ignored #37

Closed petrus-jvrensburg closed 1 year ago

petrus-jvrensburg commented 1 year ago

I'm seeing this in livebook:

> Cldr.Date.to_string!(Date.utc_today, format: :short, locale: "en")
"8/14/23"

> Cldr.Date.to_string!(Date.utc_today, format: :short, locale: "en-UK")
"8/14/23"

It seems like it's not picking up the regional variation for the second date, which should be "14/08/2023" as far as I know.

For reference, the livebook's setup cell looks like this:

# Install dependencies
Mix.install([
  :ex_cldr,
  :ex_cldr_dates_times,
  :jason,
])

# Define a backend module
defmodule DemoApp.Backend do
  use Cldr,
    locales: ["en", "es", "pt", "hi"],
    default_locale: "en",
    providers: [Cldr.Number, Cldr.Calendar, Cldr.DateTime],
    json_library: Jason
end

# Set an app-wide default backend
Application.put_env(:ex_cldr, :default_backend, DemoApp.Backend)

DemoApp.Backend.put_locale("en-US")
kipcole9 commented 1 year ago

This may be because en-UK is English as spoken in Ukraine. Maybe you mean en-GB which is English as spoken in the UK?. I will definitely check of course, just wanted to get back to you while I'm online.

kipcole9 commented 1 year ago

Also, with that backend configuration, only en is configured with is English as spoken in the US. The default language is nearly always the language as spoken in the country with the largest number of native speakers. So en means English in the US, pt means Portuguese as spoken in Brazil.

You would need to configure en-GB in the backend to be able to use the regional formats. It's still ok to use en-GB locale when only en is configured. It just means you'll get the US formats. But the territory will be still recognised as being GB. Make sense?

You can check by checking the :cldr_locale_name field of a locale to know what is being used.

kipcole9 commented 1 year ago

That prompts a couple of comments:

  1. The canonical locale is a Cldr.LanguageTag. The use of an atom or a string is just a convenience. I would propose code examples use that format. Something like:

    with {:ok, locale} <- MyApp.Cldr.validate_locale(locale) do
    ....
    end
  2. I now think that calling functions not the backend module is the "right" approach. So MyApp.Cldr.Number.to_string(...) rather than Cldr.Number.to_string(....., MyApp.Cldr).

petrus-jvrensburg commented 1 year ago

Okay, thanks for the feedback. After configuring "en-GB" explicitly the date formatting is working as expected:

> Cldr.Date.to_string!(Date.utc_today, format: :short, locale: "en-GB")

"15/08/2023"

Honestly, I was expecting the regional variations to be included automatically after configuring just "en", so that's what I found confusing. And since we don't want to throw errors on locale misses, but rather fall back gracefully, it wasn't immediately obvious what I was doing wrong.

I'm wondering if in dev it would make sense to do some 'validate locale' checks internally and log warnings to the console to guide the user to avoid this type of configuration misses.

kipcole9 commented 1 year ago

The thing is that they are all regional variants. en is the US variant, etc etc etc. Well, there is en-001 (001 means "the world") but it's not really a usable locale because its not regionalised.

I have been thinking about some kind of logging around locale resolution but I haven't yet worked out what that should really look like.

When you configure a locale, its parent locale(s) are also configured. So maybe starting with the advice to configure the most precise locales that are supported would help. That is, if you configure en-GB then en is also configured. That's because some limited data falls back to the parent locale - but that's an implementation details.

There are more than 20 variants of en, configuring them all would consume a lot of memory which is why they aren't configured by default. It is possible to use wildcards when configuring locales so you could configure `locales: ["en", "en-*"] and get them all. Just not sure that's fabulous either.

Keep the thoughts coming.