elixir-lang / elixir

Elixir is a dynamic, functional language for building scalable and maintainable applications
https://elixir-lang.org/
Apache License 2.0
24.35k stars 3.36k forks source link

DateTime with year 10000+ and Inspect #13712

Closed wojtekmach closed 2 months ago

wojtekmach commented 3 months ago

Elixir and Erlang/OTP versions

Erlang/OTP 27 [erts-15.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.18.0-dev (d9cf285) (compiled with Erlang/OTP 27)

Operating system

macOS

Current behavior

Currently DateTime inspect implementations returns the sigil that is invalid:

iex(1)> %{DateTime.utc_now() | year: 10000}
~U[10000-07-08 18:35:22.193042Z]
iex(2)> ~U[10000-07-08 18:35:22.193042Z]
** (ArgumentError) cannot parse "10000-07-08 18:35:22.193042Z" as UTC DateTime for Calendar.ISO, reason: :invalid_format
    (elixir 1.18.0-dev) lib/kernel.ex:6689: Kernel.maybe_raise!/4
    (elixir 1.18.0-dev) lib/kernel.ex:6668: Kernel.parse_with_calendar!/3
    (elixir 1.18.0-dev) expanding macro: Kernel.sigil_U/2
    iex:2: (file)
iex(2)> %{DateTime.utc_now() | year: -10000}
~U[-10000-07-08 18:35:29.800518Z]
iex(3)> ~U[-10000-07-08 18:35:29.800518Z]
** (ArgumentError) cannot parse "-10000-07-08 18:35:29.800518Z" as UTC DateTime for Calendar.ISO, reason: :invalid_format
    (elixir 1.18.0-dev) lib/kernel.ex:6689: Kernel.maybe_raise!/4
    (elixir 1.18.0-dev) lib/kernel.ex:6668: Kernel.parse_with_calendar!/3
    (elixir 1.18.0-dev) expanding macro: Kernel.sigil_U/2
    iex:3: (file)

Same goes for NaiveDateTime.

Expected behavior

First, Date solves this well:

iex> %{Date.utc_today() | year: 9999}
~D[9999-07-08]
iex> %{Date.utc_today() | year: 10000}
Date.new!(10000, 7, 8)

so I think we want something akin to that. Perhaps something along the lines of:

iex> %{DateTime.utc_now() | year: 10000}
DateTime.new!(Date.new!(10000, 7, 8), ~T[18:40:56.667522])

Note, with non-UTC datetimes we have #DateTime<>:

iex> DateTime.utc_now() |> DateTime.shift_zone!("Europe/Warsaw", Tzdata.TimeZoneDatabase)
#DateTime<2024-07-08 20:44:01.884899+02:00 CEST Europe/Warsaw>

and so maybe we do that too here:

iex> %{DateTime.utc_now() | year: 10000}
#DateTime<10000-07-08 18:40:56.667522Z>
josevalim commented 3 months ago

ISO has extensions for multi-digit years, we should probably support them, now that we lifted the year 9999 restriction. We can probably support them as a slow path. PRs welcome.

voughtdq commented 2 months ago

The standard mandates that +/- be at the beginning of a year∈(-∞,-9999)∪(9999,+∞). For formatting, we can add the + but should a representation like ~U[10000-07-08 18:35:22.193042Z] be accepted and parsed without the + or should it be considered invalid? I'm erring on the side of leniency right now.

Another part of the standard says that interchanging parties need to decide on how many digits for years∈(-∞,-9999)∪(9999,+∞). The draft implementation I have can potentially parse forever, since it just takes characters until it reaches a separator. What kind of limit would be reasonable if there should be any?

I also noticed that the recent changes accept years∈(-∞,-9999)∪(9999,+∞) for the basic format. This is fine when converting to a string, but it fails to parse since we're expecting only the basic format of yyyymmddd. Should basic parsing be updated to accept years∈(-∞,-9999)∪(9999,+∞)?

josevalim commented 2 months ago

Another part of the standard says that interchanging parties need to decide on how many digits for years∈(-∞,-9999)∪(9999,+∞). The draft implementation I have can potentially parse forever, since it just takes characters until it reaches a separator. What kind of limit would be reasonable if there should be any?

You are right. ISE8601-2, section 4.7.2., mentions it is possible to extend this by prefixing Y, but the wording is ambiguous:

It should be used only for dates that include the calendar year only, and only for years later than 9999 or earlier than -9999 (otherwise the representation provided in ISO 8601-1 should be used).

It is unclear if calendar year means the date only has the calendar year or if it has only the calendar year part (and not a decade or century one). In any case, I think for now the simplest is to not use the ISO representation for dates outside of -9999..9999. I will push a fix.