JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.73k stars 5.49k forks source link

Parsing BCE date string #42790

Open pbellette opened 3 years ago

pbellette commented 3 years ago

Hi, I'm running into a issue trying to parse BCE date strings. I'm expecting the following to work (from reading the docs I believe BCE dates are supported). E.g running

DateTime("-1-1-1T:10:00:00", DateFormat("y-m-dT:H:M:S"))

or

DateTime("-1-1-1T10:00:00", ISODateTimeFormat)

But both examples give an error (julia 1.6.2). Every possibility that I just don't know what I am doing here...

mgkuhn commented 3 years ago

Some background on the ISO 8601 standard that defines this notation:

The original ISO 8601:1988 standard was not intended to be used for years outside the range 0000 to 9999.

The ISO 8601:2004 revision added an “expanded representation” for years outside that range:

4.1.2.4 Expanded representations If, by agreement, expanded representations are used, the formats shall be as specified below. The interchange parties shall agree the additional number of digits in the time element year. In the examples below it has been agreed to expand the time element year with two digits. a) A specific day Basic format: ±YYYYYMMDD Example: +0019850412 Extended format: ±YYYYY-MM-DD Example: +001985-04-12 [...]

(The underline means zero or more repetitions of that digit.)

Note that without “agreement”, the “basic format” (without hyphens) becomes more ambiguous, e.g. does +12121212 mean December of the year 121212 or just the year 12121212?

The standard also says

This International Standard allows the identification of calendar years by their year number for years both before and after the introduction of the Gregorian calendar. For the determination of calendar years, the year number and the calendar day within the calendar year only the rules mentioned above are used. For the purposes of this International Standard the calendar based on these rules is referred to as the Gregorian calendar. The use of this calendar for dates preceding the introduction of the Gregorian calendar (also called the proleptic Gregorian calendar) should only be by agreement of the partners in information interchange. The introduction of the Gregorian calendar included the cancellation of the accumulated inaccuracies of the Julian calendar. However, no dates shall be inserted or deleted when determining dates in the proleptic Gregorian calendar. NOTE In the proleptic Gregorian calendar, the calendar year [0000] is a leap year.

So generally, ISO 8601 wasn't intended to be used with “BCE” years (where the year 1 BCE comes directly before the year 1 CE and there is no year zero), but instead was intended to be used with the “proleptic Gregorian calendar” commonly used by astronomers, which uses integer numbering of years, i.e. the year before 1 is called 0 and the year before that is −1 (= 2 BCE).

In the latest ISO 8601-1:2019 revision of the standard, the rules appear to be essentially the same.

mgkuhn commented 3 years ago

Looking at https://github.com/JuliaLang/julia/blob/8aea375d79daf7270fa668f67fbcb368aff1e9fa/stdlib/Dates/src/parse.jl#L199 I don't see any existing support in the ISODateTimeFormat parser for signed years, i.e. dy can't be negative at the moment.

That method apparently could be extended in a compatible way, to allow a preceding + or -, as both currently cause errors, and as it only supports the extended format (with hyphens).

But I'm not sure that method is even used, according to the stack trace, which instead appears to have called tryparsenext_internal.

mgkuhn commented 3 years ago

The support for negative years in ISO 8601 notation appears currently limited to generating them via DateTime(-1,1,1,10,0,0), but not via parsing them.