elixir-lang / elixir

Elixir is a dynamic, functional language for building scalable and maintainable applications
https://elixir-lang.org/
Apache License 2.0
24.26k stars 3.34k forks source link

Additions to the Calendar module #8014

Closed josevalim closed 5 years ago

josevalim commented 6 years ago

Hi everyone,

@michalmuskala and I recently discovered that we are missing some of the Calendar information that standards like ICU expect to be available. This proposal aims to rectify that, allowing libraries to provide more complete formatting/parsing.

Therefore we propose to add 6 functions the calendar behaviour.

The first one converts an extended year into its year plus era:

@callback year_of_era(year) :: {year, era :: non_neg_integer()}
@callback day_of_era(year) :: {day, era :: non_neg_integer()}

For example, the date ~D[0001-01-01] has year of era {1, 1} and day of era {1, 1}. The date ~D[0000-12-31] has year of era {1, 0} and day of era {1, 0}. The date ~D[-0001-01-01] has year of era {2, 0}.

The next two functions are more straight-forward:

@callback quarter_of_year(year, month, day) :: Calendar.quarter
@callback day_of_year(year, month, day) :: day

Calendar.quarter is a positive integer.

The next functions are about week-based dates:

@callback week_in_year(year, month, day, first_day \\ 1, min_days \\ 1) ::
            {week_based_year, week, day_in_week}
@callback week_in_month(year, month, day, first_day \\ 1, min_days \\ 1) ::
            {week_based_month, week, day_in_week}

An example of a week date can be found on ISO8601.

Those functions will also have their variants exposed in the relevant modules. This means that Date will get year_of_era/1, day_of_era/1, quarter_of_year/1, day_of_year/1, week_in_year/1, and week_in_month/1.

TODO

josevalim commented 6 years ago

/cc @kipcole9 @Qqwy @lau @bitwalker

michalmuskala commented 6 years ago

I don't think we need clock_hour - according to ICU, it has a fixed set of 4 periods defined:

There are 4 dayPeriods that are fixed; am/pm are always defined, and always have the same meaning and definition for every locale. Midnight and noon are optional, however if they are defined, they have the same meaning and definition as in all other locales where they are defined.

michalmuskala commented 6 years ago

Additionally, week_date might be problematic - is the notion of "weeks" relevant to all calendars?

FredrikAugust commented 6 years ago

@callback week_date(year, month, day) :: {year, week, day_in_week} Thought I'd just pitch in that; For this erlang has calendar:day_of_the_week and calendar:iso_week_number, making the implementation rather easy.

kipcole9 commented 6 years ago

I like the proposal and it would simplify some parts of cldr lib I maintain. A couple of thoughts:

  1. I think using ICU as the reference is a good idea. Although it doesn't codify ancient calendars, it does cover most of the calendar types in modern use.

  2. The Gregorian, Chinese, Japanese, Islamic, Hebrew, Persian and Indian calendars all have the notion of a week. So I think the notion of a week is reasonable to support. The ISOWeek calendar doesn't have months but thats a different problem.

  3. if ICU Date Fields is to be the reference then perhaps week data would include week_in_month as well? Maybe something like:

    @callback week_in_year(year, month, day) :: {year, week, day_in_week}
    @callback week_in_month(year, month, day) :: {month, week, day_in_week}
  4. I wonder if era/1 would be better to return the era number starting from a base of 0. Whilst the gregorian calendar has only 2 eras, the Japanese calendar has 236. Although to be fair, of the 17 calendars defined in ICU, its the only one with more than 2 eras. If you do choose to continue with atoms for the era, I would suggest that :bce and :ce would be a better canonical choice in the gregorian calendar.

  5. Calculating weeks relies upon knowledge of when a week starts and the minimum number of days in a week so perhaps additional functions are required?

    @callback first_day_of_week :: day
    @callback min_days_in_first_week :: non_neg_integer
    # And optionally
    @callback weekend_starts :: non_neg_integer
kipcole9 commented 6 years ago

Ah, re point 5 above, first_day_of_week and min_days_in_first_week are locale specific. Which means that either the Elixir implementation is fixed to a locale or its a bit problematic to calculate weeks.

kipcole9 commented 6 years ago

@FredrikAugust Based upon the changes made for Elixir 1.7 I believe the intent it to avoid using :calendar, in part because it does not support negative dates. I thing Date and DateTime no longer use :calendar any more.

michalmuskala commented 6 years ago

Looking at libicu itself, it defines a Chinese calendar that has a new era every 60 years: https://github.com/svn2github/libicu/blob/c43ec130ea0ee6cd565d87d70088e1d70d892f32/i18n/chnsecal.h

josevalim commented 6 years ago

I don't think we need clock_hour - according to ICU, it has a fixed set of 4 periods defined:

Good catch. clock_hour may have other uses but, given the goal here is ICU, I will remove it. Thanks.

josevalim commented 6 years ago

I wonder if era/1 would be better to return the era number starting from a base of 0. Whilst the gregorian calendar has only 2 eras, the Japanese calendar has 236.

Good catch. A number makes sense then. I wonder though how does the intersection of eras with locales work? If a locale sees the number 20 for era, the correct translation may depend on the calendar (for example, the chinese or japanese). I guess the locale "mechanism" also has to receive the calendar as argument in this case?

Ah, re point 5 above, first_day_of_week and min_days_in_first_week are locale specific. Which means that either the Elixir implementation is fixed to a locale or its a bit problematic to calculate weeks.

Is the general notion of week_date locale specific? I am aware that day_of_week is locale specific but the normalization to/from ISO is straight-forward. However, if different locales would return different week_dates for Calendar.ISO, then I would remove this from the proposal, because ultimately it won't help for ICU.

kipcole9 commented 6 years ago

Looking at libicu itself, it defines a Chinese calendar that has a new era every 60 years: https://github.com/svn2github/libicu/blob/c43ec130ea0ee6cd565d87d70088e1d70d892f32/i18n/chnsecal.h

Thats curious because the data in CLDR declares only a single era (I recognise the chinese calendar repeats every 60 years). From CLDR data:

%{
  buddhist: %{calendar_system: "solar", eras: %{0 => %{start: -198326}}},
  chinese: %{calendar_system: "lunisolar", eras: %{0 => %{start: -963144}}},
  coptic: %{
    calendar_system: "other",
    eras: %{0 => %{end: 103604}, 1 => %{start: 103605}} 
  },
  ...
}
lau commented 6 years ago

In ISO a week always starts with Monday. Erlang already has :calendar.iso_week_number

josevalim commented 6 years ago

@lau thanks, I am aware of the ISO behaviour. I want to understand though, if in the context of formatting/parsing, if the week number is also locale specific or if it just depends on the calendar.

lau commented 6 years ago

@josevalim In the separate Calendar library you can format a Date as an ISO 8601 week date or week number like this: https://hexdocs.pm/calendar/Calendar.Date.Format.html#iso_week_date/1 https://hexdocs.pm/calendar/Calendar.Date.Format.html#week_number/1 and parse it like this: https://hexdocs.pm/calendar/Calendar.Date.Parse.html#iso_week_date/1

without locale specifics.

josevalim commented 6 years ago

@lau I am aware it can be done without locale, the question is what is the ICU/CLDR recommendation on the topic. :)

kipcole9 commented 6 years ago

I believe the relevant section in TR35 indicates that the start of a week is guided by two factors: which day of the week is considered the first day and how many days there must be in a week for it to be considered the first week.

Rather than being calendar specific, these appear to be territory (country) specific and therefore related to a locale.

For example, in CLDR the relevant data is:

    iex> Cldr.Calendar.week_info
    %{first_day: %{IN: "sun", SM: "mon", MN: "mon", MZ: "sun", CR: "mon", AT: "mon",
        LA: "sun", EE: "mon", NL: "mon", PT: "mon", PH: "sun", BG: "mon", LT: "mon",
        ES: "mon", OM: "sat", SY: "sat", US: "sun", EC: "mon", SG: "sun", DM: "sun",
        AR: "sun", MK: "mon", YE: "sun", KW: "sat", GB: "mon",
        "GB-alt-variant": "sun", AD: "mon", UZ: "mon", KG: "mon", CZ: "mon",
        FI: "mon", RO: "mon", TR: "mon", AI: "mon", MM: "sun", AS: "sun", BS: "sun",
        IT: "mon", MX: "sun", BR: "sun", ID: "sun", NZ: "sun", GP: "mon", BE: "mon",
        CO: "sun", GR: "mon", NP: "sun", ME: "mon", MO: "sun", ...},
      min_days: %{SM: 4, SJ: 4, AT: 4, EE: 4, NL: 4, PT: 4, BG: 4, LT: 4, ES: 4,
        US: 1, GI: 4, GB: 4, AD: 4, CZ: 4, FI: 4, IT: 4, GP: 4, JE: 4, BE: 4, GR: 4,
        "001": 1, VI: 1, RE: 4, SE: 4, GU: 1, IS: 4, AN: 4, IM: 4, GG: 4, CH: 4,
        FO: 4, UM: 1, SK: 4, AX: 4, LU: 4, FR: 4, IE: 4, HU: 4, FJ: 4, MC: 4, GF: 4,
        NO: 4, DK: 4, DE: 4, LI: 4, PL: 4, VA: 4, MQ: 4},   
      weekend_end: %{
        "001": "sun",
        AE: "sat",
        AF: "fri",
        ...
      },
      weekend_start: %{
        "001": "sat",
        AE: "fri",
        ...
        IN: "sun",
        ....
      }
    }

Where the territory "001" is the UN M.49 code for "world" and therefore the default.

One way for to approach this would be:

@callback week_in_year(year, month, day, first_day, min_days) :: {year, week, day_in_week}
@callback week_in_month(year, month, day, first_day, min_days) :: {month, week, day_in_week}

In Date defined as:

def week_in_year(year, month, day, first_day \\ 1, min_days \\ 1)
def week_in_month(year, month, day, first_day \\ 1, min_days \\ 1)

which would, i think, be consistent with most expectations - although I hesitate to say that Monday would be universally accepted even in this community as the unambiguous first day of the week.

This approach would decouple from the formal notion of a locale while at the same time allowing library writers (including me) to make stronger use of the standard behaviour.

kipcole9 commented 6 years ago

BTW, if you accept this approach I can almost remove a library-specific behaviour I had to add to support weeks. If you consider adding weekend?/3 and/or weekday?/3 then I can remove it completely.

The would be defined as:

@callback weekend?(year, month, day, weekend_starts, weekend_ends) :: boolean

with a signature of

def weekend?(year, month, day, weekend_starts \\ 6, weekend_ends, \\ 7)
josevalim commented 6 years ago

Thanks @kipcole9!

I have updated the proposals with the signatures for week_in_year and week_in_month. I don't think the weekend? is necessary as that seems to be checking the result of day_of_week against a range, so it should be possible today.

Do you have an implementation of week_in_year and week_in_month available by any chance?

josevalim commented 6 years ago

I believe this has been finalized. Can everybody take a final look at the proposal and let me know their thoughts? Thank you.

kipcole9 commented 6 years ago

José, I have implementations for each of the proposed functions and I'm happy to craft a PR with tests and will do so over the weekend if thats acceptable.

josevalim commented 6 years ago

@kipcole9 that would be awesome. Can we make it two PRs though? One for week_in_year/1 and week_in_month/1 and another with the remaining ones? It will be easier to manage. :)

kipcole9 commented 6 years ago

Sure, no problem. Can do.

fertapric commented 6 years ago

I wonder if era/1 would be better to return the era number starting from a base of 0. Whilst the gregorian calendar has only 2 eras, the Japanese calendar has 236.

Good catch. A number makes sense then. I wonder though how does the intersection of eras with locales work? If a locale sees the number 20 for era, the correct translation may depend on the calendar (for example, the chinese or japanese). I guess the locale "mechanism" also has to receive the calendar as argument in this case?

Should the proposal be updated to use numbers instead of atoms for eras?

The first one converts an extended year into its year plus era:

 @callback era(year) :: {year, non_neg_integer()}

For example, the date ~D[0000-01-01] has era {1, 1}. The date ~D[-0001-01-01] has era {2, 0}.

I assumed :bce was 0, and :ce was 1.

(If we stick to atoms, I think the first one should be :ad (or :ce))

kipcole9 commented 6 years ago

Ah, yes, good catch. I believe José agrees given his comment above:

I wonder if era/1 would be better to return the era number starting from a base of 0. Whilst the gregorian calendar has only 2 eras, the Japanese calendar has 236.

Good catch. A number makes sense then. I wonder though how does the intersection of eras with locales work? If a locale sees the number 20 for era, the correct translation may depend on the calendar (for example, the chinese or japanese). I guess the locale "mechanism" also has to receive the calendar as argument in this case?

Therefore unless José objects I will implement the signature you proposed:

@callback era(year) :: {year, non_neg_integer()}

I also didn't answer José's question about localising an era that is out of range for a calendar. My observation so far is that this isn't a major issue since CLDR data has localised names for eras in all the 17 calendars defined and the code (in my lib at least) won't return an era for a calendar that is outside the defined range for that calendar.

josevalim commented 6 years ago

Thanks, updated the era to be a non_neg_integer.

michalmuskala commented 6 years ago

It seems weird to me that week_in_year returns the year in the result tuple. It can't be different than the argument, right? Similar for week_in_month and month.

josevalim commented 6 years ago

@michalmuskala it actually can! It depends on the min_days value. For example, in some cases the min days is 4, which means the first week of the year starts whenever a week has a thursday in that year. So if January 1st is a friday, it is still the previous year's week date.

michalmuskala commented 6 years ago

Right. That makes sense. So I think we should change the names in the output to be something like week_based_year, etc to avoid confusion.

josevalim commented 6 years ago

@michalmuskala to me week_based_year means it is returning a year and here we are getting the year, week and day of the week. Where I find the week to be the most important. ICU uses week_of_year and week_of_month. Maybe go with ICU?

michalmuskala commented 6 years ago

Oh I meant just for the element of the result tuple. I'm fine with the function name.

josevalim commented 6 years ago

@michalmuskala oh I see, fixed!

Qqwy commented 6 years ago

I'm a bit late to the party, because I just returned from a two-week vacation in the middle of the beautiful nature of Norway.


This is a really good proposal! I believe all important discussion has already happened.

The only thing I'd like to point out, mostly in response to @lau 's comments, is that whereas we can base our implementation on e.g. the implementation of :calendar.iso_week_number, that we cannot directly use it, since most of the :calendar-functions do not work properly for BCE dates.

kipcole9 commented 6 years ago

Not forgotten but clearly a bit behind on extracting, refactoring and adding tests. Expected to be done by the end of this week.

kkalinin commented 6 years ago

Sorry to interrupt, but what do you think of RFC822 date-time format support? It is very useful on the web, and at the moment I have to implement it myself. Or should it be kind of a helper function in Phoenix?

fertapric commented 6 years ago

@kkalinin the timex package supports RFC822 and other formats :)

https://hexdocs.pm/timex/Timex.Format.DateTime.Formatters.Default.html#module-compound-directives

kkalinin commented 6 years ago

@fertapric yes, thanks! But this package is older than Elixir's native DateTime support, so I thought that there are plans to implement most common capabilities inside language's lib.

josevalim commented 6 years ago

@kkalinin we are integrating capabilities little by little but we have no plans to integrate parsing/formatting for now. Timex does want to extract parsing/formating to a separate lib though.

fertapric commented 6 years ago

As a side note, if Elixir starts adding parsing/formatting capabilities, I would start implementing first strftime and strptime :)

josevalim commented 6 years ago

I also would like to propose two amendments:

  1. Rename era to year_of_era, keeping the same return type

  2. Add day_of_era(year, month, day) with a return type similar to year_of_era, except the first element of the tuple is the day (instead of year)

josevalim commented 5 years ago

Ping @kipcole9!

josevalim commented 5 years ago

Just a heads up: I will go ahead and tackle one of those in the live stream and leave the remaining ones open for contribution. :)

kipcole9 commented 5 years ago

My very bad on this. I have committed this weekend to the submitting code for these additions.

On 19 Nov 2018, at 9:59 pm, José Valim notifications@github.com wrote:

Just a heads up: I will go ahead and tackle one of those in the live stream and leave the remaining ones open for contribution. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elixir-lang/elixir/issues/8014#issuecomment-439884715, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA-F6_rzyFVO1XIqDe5Gsm5fqrV0RpWks5uwqtLgaJpZM4VnMRg.

kipcole9 commented 5 years ago

I have added a PR for discussion on the first four callbacks and Calendar.ISO implementations:

The remaining two are nearly complete (I wasn't happy with my existing implementation). They are:

josevalim commented 5 years ago

I have sent a PR for week_of_year/3 here: #8460.

Note that I went with week_of_year for consistency with the other functions. I have not added support for min_days / first_day because for Calendar.ISO those are pre-determined and not a general function. A different calendar may use another week mechanism though.

For now, I am considering this issue as complete. We don't quite support week_in_month, but it can be implemented based on the current week_of_year/3 if someone desires it. Thanks everyone! :heart: