Open dlurton opened 6 years ago
The table below (generated by this code) shows all the dates on which a descrepency between the epoch milliseconds returned by software.amazon.ion.Timestamp.getMillis()
and java.time.Interval.toEpochMilli()
for the same dates changes and the amount of the discrepency in days.
1582-10-04T00:00:00.000Z 10.0
1500-02-28T00:00:00.000Z 9.0
1400-02-28T00:00:00.000Z 8.0
1300-02-28T00:00:00.000Z 7.0
1100-02-27T00:00:00.000Z 6.0
1000-02-27T00:00:00.000Z 5.0
0900-02-27T00:00:00.000Z 4.0
0700-02-28T00:00:00.000Z 3.0
0600-02-28T00:00:00.000Z 2.0
0500-02-28T00:00:00.000Z 1.0
0300-02-27T00:00:00.000Z 0.0
0200-02-27T00:00:00.000Z -1.0
0100-02-27T00:00:00.000Z -2.0
For example, all dates between 1582-10-04T
and 1500-03-01T
, Timestamp.getMillis()
and Interval.toEpochMilli()
return values that are 10 days apart. For all dates between
1500-02-28T
and 1400-03-01T
the return values are 9 days apart, etc. (Both methods return the same values for all dates prior to 1582-10-04T
).
Timestamp
relies on java.util.Date
which relies in turn on GregorianCalendar which is (from the javadoc) "is a hybrid calendar that supports both the Julian and Gregorian calendar systems with the support of a single discontinuity, which corresponds by default to the Gregorian date when the Gregorian calendar was instituted (October 15, 1582 in some countries, later in others)." Also: "Historically, in those countries which adopted the Gregorian calendar first, October 4, 1582 (Julian) was thus followed by October 15, 1582 (Gregorian). This calendar models this correctly. Before the Gregorian cutover, GregorianCalendar implements the Julian calendar. The only difference between the Gregorian and the Julian calendar is the leap year rule. The Julian calendar specifies leap years every four years, whereas the Gregorian calendar omits century years which are not divisible by 400."
That java.time.Instant
does not appear to account for Julian leap years and this is the source of the discrepencies. The first discrepency appearing on 1582-10-04T
is due to the transition from the Julian calendar to the Gregorian calendar. The subsequent discrepencies occur because of the differences in leap years between the Julian and Gregorian calendars.
This caused me to examine the source code of Timestamp
and it would appear to me that we do not account for a Julian calendar at all and seem to assume that all dates are Gregorian. This has some implications which I will describe in my next comment.
Interesting reading: https://en.wikipedia.org/wiki/Gregorian_calendar
Given the following code:
private double MILLIS_PER_DAY = 86400000.0;
@Test
public void leapYearTest() {
System.out.println(Timestamp.valueOf("0200-02-28T00:00:00.000Z").getMillis() / MILLIS_PER_DAY);
System.out.println(Timestamp.valueOf("0200-03-01T00:00:00.000Z").getMillis() / MILLIS_PER_DAY);
System.out.println(Timestamp.valueOf("0200-02-29T00:00:00.000Z").getMillis() / MILLIS_PER_DAY);
}
The output is:
-646422.0
-646420.0
java.lang.IllegalArgumentException: Day 29 for year 200 and month 2 must be between 1 and 28 inclusive
Note that there are two days of difference between 0200-02-28T
and 0200-03-01T
, yet the value for0200-02-29T
causes an exception. This is because in the Julian calendar, there really are two days of difference (0200-02-29T
is completely valid), but if we extended the Gregorian calendar back to the year 200, it would be leap year.
The epoch milliseconds value of Timestamp
for the 10 "missing" days that marked adoption of the Gregorian calendar starting 1582-10-05T
is repeated for the following 10 days. In other words, 1582-10-05T
has the same epoch seconds as 1582-10-15T
.
The table below was generated with this code. The 2nd column is the epoch-days of of the timestamp in the 1st column, the 4th column is the epoch-days of the timestamp in the 3rd column.
1582-10-05T00:00:00.000Z -141427.0 1582-10-15T00:00:00.000Z -141427.0
1582-10-06T00:00:00.000Z -141426.0 1582-10-16T00:00:00.000Z -141426.0
1582-10-07T00:00:00.000Z -141425.0 1582-10-17T00:00:00.000Z -141425.0
1582-10-08T00:00:00.000Z -141424.0 1582-10-18T00:00:00.000Z -141424.0
1582-10-09T00:00:00.000Z -141423.0 1582-10-19T00:00:00.000Z -141423.0
1582-10-10T00:00:00.000Z -141422.0 1582-10-20T00:00:00.000Z -141422.0
1582-10-11T00:00:00.000Z -141421.0 1582-10-21T00:00:00.000Z -141421.0
1582-10-12T00:00:00.000Z -141420.0 1582-10-22T00:00:00.000Z -141420.0
1582-10-13T00:00:00.000Z -141419.0 1582-10-23T00:00:00.000Z -141419.0
1582-10-14T00:00:00.000Z -141418.0 1582-10-24T00:00:00.000Z -141418.0
This behavior is inherited from java.util.Date
.
Before attempting any fix we need to decide if respecting Julian leap years prior to 1582 like java.util.Date
does is the right thing for ion-java’s Timestamp
or if it should take java.time.Instant
’s approach and extend the Gregorian calendar backward. The latter seems to have been the original intention of Timestamp
, but correcting these discrepancies would change the value of Timestamp.getMillis()
for most dates prior to 1582-10-05T
and I’m concerned that this might break existing customer code that may have grown to be dependent on this behavior.
The main TIL here seems to be that the JRE implements multiple mappings between epoch-millis and calendar dates, and that APIs like Timestamp.getMillis()
are therefore not as well-defined as we thought.
Therefore it seems like we should deprecate the millis APIs, and provide replacements in terms of Date
, Calendar
, and/or Instant
. That lets us offload the specifics of the millis mappings to components that are better situated to support the alternatives.
Somewhat independently, the Ion spec is mum on the specifics of the calendar system, and perhaps that needs another look (and discussion elsewhere).
Resolving, as #195 provides the ability for the user to control which calendar system is used. We can continue to discuss elsewhere whether an amendment to the spec needs to be made to talk about calendar system specifics.
Reopening, as we need to revert the change to back Timestamp with Calendar due to increased heap usage caused by the change (Calendars are big).
We'll need to come up with a different strategy for dealing with the calendar system transition and its effect on leap years within Timestamp.
Originally posted by @raganhan in https://github.com/amzn/ion-java/issues/160#issuecomment-427997832