Normalize components in DateTimePeriod/DatePeriod

ilya-g commented 3 years ago

Currently we provide DateTimePeriod type as a bag of several independent components: years, months, days, hours, minutes, seconds, and nanosecodns. This approach leads to several problems:

when nanoseconds value exceeds the range [0..10^9), the ISO representation is rendered incorrectly (see #79)
there can be several non-equal values of DateTimePeriod that produce the same ISO string representation. Since we intend to use that string representation as the default serialized form for textual formats, it could lead to a situation when deserialized value would be not equal to what was serialized before.

To avoid these problems we propose to do DateTimePeriod component normalization at construction time in the following way:

all time components, e.g. hours..nanoseconds are normalized to the single long total value of nanoseconds;
days component is stored as is;
years and months components are stored together as the total number of months.

Note that this normalization only affects DateTimePeriod storage and internal representation. There shall still remain component properties that will derive values from the normalized internal state. For example, hours property will return the normalized number of nanoseconds divided by 3600 * 10^9, and months will return totalMonths % 12.

This change will have the following consequences:

seconds and nanoseconds properties change their type from Long to Int. We may leave a secondary constructor with the long nanoseconds parameter for convenience, though.
it may be impossible to fit the difference between extreme dates, such as platform specific min and max dates, as an Int total number of months.
the time part of DateTimePeriod will be limited to the ±292 year range. This usually isn't a practical concern because the time part doesn't exceed 24H in most cases.

ilya-g commented 3 years ago

The implementation goes in the PR #80

dkhalanskyjb commented 3 years ago

Good summary. Some minor clarifications:

there can be several non-equal values of DateTimePeriod that produce the same ISO string representation

This concerns not the current implementation of toString (which behaves incorrectly) but the possible alternative implementations, for example, one which adds the whole number of seconds in the nanoseconds field to the number displayed as the number of seconds: this way, DateTimePeriod(seconds = 1, nanoseconds = 1) and DateTimePeriod(seconds = 0, nanoseconds = 1_000_000_001) would have the same string representation.

it may be impossible to fit the difference between extreme dates, such as platform specific min and max dates, as an Int total number of months.

While this may seem like an obscure thing, in practice this means that on JVM there is a new failure mode for Instant.periodUntil and LocalDate.periodUntil if there are more than Int.MAX_VALUE months between the two arguments. Before, having an additional years: Int field in DateTimePeriod stored the excess months.

straight-shoota commented 3 years ago

I'm designing a Period type for Crystal's time API and I've looked at kotlinx-datetime for inspiration because it looks quite similar in goals and implementation. I think you're doing a great job reducing a huge and hard-to-comprehend API like Java's to the essentials for everyday users :+1:

My concept for a Period type has evolved towards storing data in normalized form only. So this propsal here is a really surprising coincidence, as it describes almost the same approach. Frankly, I haven't seen any other date/time API do this, most implementations just have a collection of individual fields, with maybe having an option to normalize the value. But it seems so much unnecessary and overly complicating things. Still, I was wondering whether my idea was somehow flawed if no-one else thought of it before. So it is really assuring to see you're heading the same direction.

I'm wondering however, why to keep days a separate component. Since we're talking about conceptual periods, 1 day is unambiguously equivalent to 24 hours. So there shouldn't be any reason to have it as a separate property, it could just be included as a multiple of the (nano)second based component.

dkhalanskyjb commented 3 years ago

Hi @straight-shoota!

We're keeping days a separate component because we don't use the definition of one day as being equal to 24 hours. Instead, the same way as Java does in this case, we follow in spirit the following definition from ISO-8601:

The term “day” applies also to the duration of any time interval which starts at a certain time of day on a certain calendar day and ends at the same time of day on the next calendar day. (3.1.2.10)

In some time zones, the duration between 14:27 of one day and 14:27 of the next may not be 24 hours due to, for example, DST transitions, when it may turn out to be 23 or 25 hours (or even something else).

straight-shoota commented 3 years ago

Yes sure, a conceptual day does not necessarily last 24 hours of elapsed time. But changes in time zone offsets only matter when measurement of elapsed time is combined with calendrical representation.

This doesn't really work out in a single type, especially when it uses normalized values because the same effect applies for smaller units as well. The duration of 1 conceptual minute could for example represent 61 minutes of elapsed time when it goes over a time zone offset shift by -1 hour.

I understood DateTimePeriod as purely conceptual, based on calendrical units and elapsed time being represented as Duration. Maybe that was a false impression, but I would really recommend a strict separation.

dkhalanskyjb commented 3 years ago

ISO-8601 defines one minute to be 60 seconds except in cases where leap seconds come into play (3.1.2.3), and our implementation behaves the same way: adding one minute to an Instant always results in an Instant that is 60 seconds later. This is not dependent on a time zone.

However, in order to add a day in the ISO-8601 sense to an Instant, we do need to know the time zone in which it happens, as otherwise, we wouldn't know how many hours a particular day consists of.

Now, we could implement something other than what ISO-8601 prescribes, but we chose not to because we thought that what it proposes would make good sense intuitively from the end-user perspective. When it's 14:57 on some Monday and a website says that a sale ends two days and three minutes later, one would usually think "Oh, so the sale will end at 15:00 on Wednesday" instead of checking to see if there are some nefarious time shifts happening during the two days, so it makes sense to consider time zones when advancing days. However, it would be silly to bake a cake for an hour longer than the recipe states just because there was a time shift in the meantime.

And even if we did decide to implement a day as 24 hours, it would still be completely impossible to add a month to an Instant without knowing the time zone, as we wouldn't even know what month it is in the initial instant in the implied time zone. So, we would need to know the time zone either way or limit ourselves to time-based components only–but we already have Duration to represent that.

straight-shoota commented 3 years ago

ISO 8601 focuses on information interchange and really doesn't specify the interpration of durations, it just covers the representation format. Meaning is specific to the application context. And you can use ISO 8601 representation for both, elapsed and conceptual amounts of time.

Adding an amount of time to an instant can always be ambiguous: It can mean to proceed to the instant after exactly that amount of time has elapsed (that's the oven timer) or it can mean to turn the wall clock to a reading that equals to the conceptual distance represented by the amount (that's the timer til sale).

However, it would be silly to bake a cake for an hour longer than the recipe states just because there was a time shift in the meantime.

That's exactly my point. How do you differntiate them? I think this can only be done with separate data types, i.e. Duration for elapsed time and DateTimePeriod for conceptual time.

Interpreting your comments I understand that your reasoning is to interpret years, months, days of DateTimePeriod as conceptual and the smaller units as elased time. So a representation as DateTimePeriod would be equivalent to DatePeriod plus Duration. Is that correct?

val berlin = TimeZone.of("Europe/Berlin")
val instant = LocalDateTime(2020, 10, 25, 1, 0, 0, 0).toInstant(berlin)
instant.plus(DateTimePeriod(days = 1)) // this instant is 25 hours elapsed time later
instant.plus(DateTimePeriod(hours = 1)) // this instant is 1 hour elapsed time later, why not 2 hours elapsed?

This just seems weird to me. Why does adding days adjust for DST change and adding hours does not? When adding a day is interpreted as conceptual, I would expect other units to be interpreted as conceptual, too.

I see no reason for DateTimePeriod to treat time units as elapsed time because that's already covered by Duration. But there's no other way to represent conceptual units smaller than days.

it would still be completely impossible to add a month to an Instant without knowing the time zone, as we wouldn't even know what month it is in the initial instant in the implied time zone.

You can't even add a second of conceptual time to an instant without awareness of the time zone rules because right in that second might be a change of time zone offset which influences the result.

dkhalanskyjb commented 3 years ago

Hmm, I see your point and generally agree with it. Notably, you aren't the first to approach us with this concern: https://github.com/Kotlin/kotlinx-datetime/issues/89 The angle there is different, but the root cause is the same. The way you put this did make me rethink that issue as well and better understand the concern. Still, it's not clear what would be the right solution for this.

Indeed, complex, ad-hoc rules governing the data types are usually a sign of a design error. However, it is somewhat deliberate in this case.

or it can mean to turn the wall clock to a reading that equals to the conceptual distance represented by the amount (that's the timer til sale).

My point was, the timer is already this mix of conceptual/elapsed. Let's say take a timer that always operates with the wall clock time, which means it has simpler semantics. Then the following is possible: if it is 02 hours, 00 minutes, and 30 seconds until something happens, the timer shows as much, which I observe, then, after one minute passes, the timer would claim that 02 hours, 59 minutes, and 30 seconds are left, utterly confusing me, as, being an end-user in this case, I would not expect that at all. Would you not think it was a bug?

In general, from my perspective, it would be very counterintuitive if, for some moments A, A', and B, such that A < A' < B, it happened that A.periodUntil(B, tz) < A'.periodUntil(B, tz) (in terms of the componentwise partial order). This unfortunate disconnect between how countdowns of dates and times are handled does seem to me like a part of the subject area and a big part of why DST transitions are confusing. The asymmetry stems from the fact that DST transitions never retract or advance whole days, only hours and smaller units. Hence, the current implementation preserves this weaker form of monotonicity of periodUntil.

ISO 8601 focuses on information interchange and really doesn't specify the interpration of durations, it just covers the representation format. Meaning is specific to the application context.

The standard does build up from the definition of seconds by the International System of Units, so I don't see the room for ambiguity. One second period is defined always to represent one SI second and never, for example, 3601 SI seconds.

But there's no other way to represent conceptual units smaller than days.

You are right, there is none, and yes, this is unpleasantly asymmetric. Yet, from the end-user perspective, a conceptual minute would be defined like this: "The amount of time it would take for the wall clock to advance by exactly one minute, except if the result would be in a time shift gap, in which case it is the amount of time it would take for the wall clock to advance by the number of minutes in the gap + 1". Is this a useful concept at all? "Same time, next day" is arguably clearer in meaning and more applicable than "same sub-minute part, next minute." Do you have any particular use case in mind for this? If so, it would greatly strengthen the argument that we may need to implement such a concept.

To summarize, the solution you're suggesting–making DateTimePeriod operate on conceptual time units–would defy at least some expectations and become slightly incompatible with ISO-8601. Another solution would be to do this the way Java does it and separate what we call DateTimePeriod into what they call Duration and Period. We don't want to go down that path. While also conceptually more orthogonal, it is also confusing to the programmers, judging by the number of questions these data types attracted.

Thank you for raising this! Your framing is interesting and thought-provoking. What's clear for now is contention about us mixing the operations that operate on the wall time (always only out of necessity) with those that don't, muddying the API semantics with the separation between time-based units and date-based units. It's not clear for now what can be done about it, though.

straight-shoota commented 3 years ago

The timer example is indeed a really good case for mixing concepts 🤔 When the amount is larger and expressed in days for conciseness, a potential difference introduced by time zone offset change doesn't have a huge impact on the overall amount and probably would correlate to the common understanding of a day. But when it's only a couple of hours, the difference between conceptual and elapsed time is significant and the latter is definitely more intuitive.

So it seems having normalized months, days, and normalized (nano-)seconds fields as suggested is probably the best solution then. It's not totally clean, but practical. At least I don't have a better option either :D As long as the different interpretations as conceptual/elapsed are documented, it's probably fine.

Kotlin / kotlinx-datetime

Normalize components in DateTimePeriod/DatePeriod #81