amazon-ion / ion-java

Java streaming parser/serializer for Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
864 stars 110 forks source link

Different milliseconds value from Ion Timestamp and java.time.Interval for 0001T #165

Open dlurton opened 5 years ago

dlurton commented 5 years ago
String tsText = "0001-01-01T00:00:00Z";

long instantMillis = Instant.parse(tsText).toEpochMilli(); // -62135596800000
long tsMillis = Timestamp.valueOf(tsText).getMillis();     // -62135769600000
Timestamp.forMillis(instantMillis, 0);                     // 0001-01-03T00:00:00.000Z

Originally posted by @raganhan in https://github.com/amzn/ion-java/issues/160#issuecomment-427997832

dlurton commented 5 years ago

The table below (generated by this code) shows all the dates on which a descrepency between the epoch milliseconds returned by software.amazon.ion.Timestamp.getMillis() and java.time.Interval.toEpochMilli() for the same dates changes and the amount of the discrepency in days.

1582-10-04T00:00:00.000Z 10.0
1500-02-28T00:00:00.000Z 9.0
1400-02-28T00:00:00.000Z 8.0
1300-02-28T00:00:00.000Z 7.0
1100-02-27T00:00:00.000Z 6.0
1000-02-27T00:00:00.000Z 5.0
0900-02-27T00:00:00.000Z 4.0
0700-02-28T00:00:00.000Z 3.0
0600-02-28T00:00:00.000Z 2.0
0500-02-28T00:00:00.000Z 1.0
0300-02-27T00:00:00.000Z 0.0
0200-02-27T00:00:00.000Z -1.0
0100-02-27T00:00:00.000Z -2.0

For example, all dates between 1582-10-04T and 1500-03-01T, Timestamp.getMillis() and Interval.toEpochMilli() return values that are 10 days apart. For all dates between 1500-02-28T and 1400-03-01T the return values are 9 days apart, etc. (Both methods return the same values for all dates prior to 1582-10-04T).

Timestamp relies on java.util.Date which relies in turn on GregorianCalendar which is (from the javadoc) "is a hybrid calendar that supports both the Julian and Gregorian calendar systems with the support of a single discontinuity, which corresponds by default to the Gregorian date when the Gregorian calendar was instituted (October 15, 1582 in some countries, later in others)." Also: "Historically, in those countries which adopted the Gregorian calendar first, October 4, 1582 (Julian) was thus followed by October 15, 1582 (Gregorian). This calendar models this correctly. Before the Gregorian cutover, GregorianCalendar implements the Julian calendar. The only difference between the Gregorian and the Julian calendar is the leap year rule. The Julian calendar specifies leap years every four years, whereas the Gregorian calendar omits century years which are not divisible by 400."

That java.time.Instant does not appear to account for Julian leap years and this is the source of the discrepencies. The first discrepency appearing on 1582-10-04T is due to the transition from the Julian calendar to the Gregorian calendar. The subsequent discrepencies occur because of the differences in leap years between the Julian and Gregorian calendars.

This caused me to examine the source code of Timestamp and it would appear to me that we do not account for a Julian calendar at all and seem to assume that all dates are Gregorian. This has some implications which I will describe in my next comment.

Interesting reading: https://en.wikipedia.org/wiki/Gregorian_calendar

dlurton commented 5 years ago

Given the following code:

private double MILLIS_PER_DAY = 86400000.0;
@Test
public void leapYearTest() {
    System.out.println(Timestamp.valueOf("0200-02-28T00:00:00.000Z").getMillis() / MILLIS_PER_DAY);
    System.out.println(Timestamp.valueOf("0200-03-01T00:00:00.000Z").getMillis() / MILLIS_PER_DAY);
    System.out.println(Timestamp.valueOf("0200-02-29T00:00:00.000Z").getMillis() / MILLIS_PER_DAY);
}

The output is:

-646422.0
-646420.0

java.lang.IllegalArgumentException: Day 29 for year 200 and month 2 must be between 1 and 28 inclusive

Note that there are two days of difference between 0200-02-28T and 0200-03-01T, yet the value for0200-02-29T causes an exception. This is because in the Julian calendar, there really are two days of difference (0200-02-29T is completely valid), but if we extended the Gregorian calendar back to the year 200, it would be leap year.

dlurton commented 5 years ago

The epoch milliseconds value of Timestamp for the 10 "missing" days that marked adoption of the Gregorian calendar starting 1582-10-05T is repeated for the following 10 days. In other words, 1582-10-05Thas the same epoch seconds as 1582-10-15T.

The table below was generated with this code. The 2nd column is the epoch-days of of the timestamp in the 1st column, the 4th column is the epoch-days of the timestamp in the 3rd column.

1582-10-05T00:00:00.000Z -141427.0 1582-10-15T00:00:00.000Z -141427.0
1582-10-06T00:00:00.000Z -141426.0 1582-10-16T00:00:00.000Z -141426.0
1582-10-07T00:00:00.000Z -141425.0 1582-10-17T00:00:00.000Z -141425.0
1582-10-08T00:00:00.000Z -141424.0 1582-10-18T00:00:00.000Z -141424.0
1582-10-09T00:00:00.000Z -141423.0 1582-10-19T00:00:00.000Z -141423.0
1582-10-10T00:00:00.000Z -141422.0 1582-10-20T00:00:00.000Z -141422.0
1582-10-11T00:00:00.000Z -141421.0 1582-10-21T00:00:00.000Z -141421.0
1582-10-12T00:00:00.000Z -141420.0 1582-10-22T00:00:00.000Z -141420.0
1582-10-13T00:00:00.000Z -141419.0 1582-10-23T00:00:00.000Z -141419.0
1582-10-14T00:00:00.000Z -141418.0 1582-10-24T00:00:00.000Z -141418.0

This behavior is inherited from java.util.Date.

dlurton commented 5 years ago

Before attempting any fix we need to decide if respecting Julian leap years prior to 1582 like java.util.Date does is the right thing for ion-java’s Timestamp or if it should take java.time.Instant’s approach and extend the Gregorian calendar backward. The latter seems to have been the original intention of Timestamp, but correcting these discrepancies would change the value of Timestamp.getMillis() for most dates prior to 1582-10-05T and I’m concerned that this might break existing customer code that may have grown to be dependent on this behavior.

toddjonker commented 5 years ago

The main TIL here seems to be that the JRE implements multiple mappings between epoch-millis and calendar dates, and that APIs like Timestamp.getMillis() are therefore not as well-defined as we thought.

Therefore it seems like we should deprecate the millis APIs, and provide replacements in terms of Date, Calendar, and/or Instant. That lets us offload the specifics of the millis mappings to components that are better situated to support the alternatives.

Somewhat independently, the Ion spec is mum on the specifics of the calendar system, and perhaps that needs another look (and discussion elsewhere).

tgregg commented 5 years ago

Resolving, as #195 provides the ability for the user to control which calendar system is used. We can continue to discuss elsewhere whether an amendment to the spec needs to be made to talk about calendar system specifics.

tgregg commented 5 years ago

Reopening, as we need to revert the change to back Timestamp with Calendar due to increased heap usage caused by the change (Calendars are big).

We'll need to come up with a different strategy for dealing with the calendar system transition and its effect on leap years within Timestamp.