CalConnect / cc-datetime-explicit

1 stars 1 forks source link

Exact Duration with overflow in intermediate results #8

Closed calconnect-ci closed 4 years ago

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 08:08

Please clarify the expected results for adding durations which would cause overflows in intermediate results.

Example:

2018-02-23 + P1M7D

To my understanding this would be calculated as

(2018-02-23 + P7D) + P1M => (2018-02-30) + P1M => 2018-03-02 + P1M => 2018-04-02

Is this correct?

I think this result is quite unexpected. I'd expect 2018-03-30

How would I have to specify the duration to end up with 2018-03-30? Is P35D the only option?

Have you considered accounting for the overflow after doing all the math (allowing illegal intermediate results), i.e. like this?

2018-02-23 + P1M7D => (2018-02-23 + P7D) + P1M => 2018-02-30 + P1M => 2018-03-30
2018-02-23 + P1M10D => (2018-02-23 + P10D) + P1M => 2018-02-33 + P1M => 2018-03-33 => 2018-04-02
calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 08:17

Yes it is correct. Your suggestion on resolving overflow at the end sounds like a good one.

To do so, after the duration additions, we will have to resolve overflow individually for every component from the highest order to the lowest order.

What do others think?

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 7, 2018, 08:31

Created by: atlauren

2018-02-23 + P1M7D

I would expect this to be evaluated as: 2018-02-23 + P1M7D => (2018-02-23 + P1M) + P7D => 2018-03-23 + P7D => 2018-03-30

Is there an ordering structure I'm missing? Why the assumption of doing P7D before P1M?

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 7, 2018, 09:05

Created by: atlauren

I think those people are wrong. 😉

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 09:06

I could go with 2018-03-02. The example in #13 convinced me.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 09:19

2022-02-29 + P1Y3M2D == 2022-02-29 + P1Y + P3M + P2D

This property is why we originally wanted valid intermediate results. With precedence, most likely, unless we can handle without...

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 09:35

IMO we don't necessarily need an associative property, nor a commutative property. So I'd be fine with

2022-02-29 + P1Y3M2D != 2022-02-29 + P1Y + P3M + P2D

In fact we already have (in the CET time zone)

(2018-03-24T12:00:00 + P1D) + PT24H != (2018-03-24T12:00:00 + PT24H) + P1D

because there was a DST change on that weekend.

And I think this is currently undefined/ambiguous:

2018-03-24T12:00:00 + P1DT24H
calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 08:42

Appendix A says:

Starting from the value of the lowest order time scale unit unit_min to the highest order unit unit_max, consider each duration_i of unit_i:

But I agree, starting with the highest order time scale sounds more natural. We probably should try this on the other examples and some edge cases to see whether it always returns expected results.

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 09:48

This is also true (no matter which precedence we use):

(2018-02-21 + P3M) + P10D != (2018-02-21 + P10D) + P3M

So only one of these can be equal to 2018-02-21 + P3M10D, not both of them.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 09:04

The reason why the original algorithm arrives at 2018-02-28 is that (some) people expect P1M2D to add 2 days to the original date, which becomes 2018-01-31, which is the last day of month, and therefore the resulting instance is the last day of month in February.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 11:10

@atlauren it is exactly the intent to allow selection rules for date and time representation. [P] is not meant to be a Swiss Army knife.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 09:00

Here's the thing. If we allow the intermediate results to overflow anyway (as suggested by Marten), there is no need to enforce the addition order. It is only the order of resolving the overflow that matters.

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 08:59

This is another example from Appendix A:

2018-01-29 + P1M2D => (2018-01-29 + P2D) + P1M => (2018-01-31) +P1M => 2018-02-31 =truncate> 2018-02-28

starting from the highest order time scale (not allowing illegal intermediate results) results in

2018-01-29 + P1M2D => (2018-01-29 + P1M) + P2D => (2018-02-29) +P2D =truncate> (2018-02-28) +P2D => 2018-02-30 =overflow> 2018-03-02

allowing invalid intermediate values results in

2018-01-29 + P1M2D => (2018-01-29 + P2D) + P1M => (2018-01-31) +P1M => 2018-02-31 =overflow> 2018-03-03

In this case the order doesn't matter

so which one would you expect 2018-02-28, 2018-03-02 or 2018-03-03?

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 09:02

Exactly, it is really up to what an expression actually means.

Should 2018-01-29 + P1M2D end up as:

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 19:16

The only reason why someone will specify [start]/[duration] is for convenience.

I disagree. You want to specify a duration to express an intent.

Say you create an interval from 2018-01-15 to 2018-02-15 and later you move the start by one month to 2018-02-15. What's the new end date?

On the other hand, if you create an interval on 2018-01-15 with a duration of P1M or P31D, what's the new end date now when you move the start to 2018-02-15?

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 19:56

Yeah, the idea is to append the “T” to every time component, such as separate “TM” and “M”, or replace “TM” with some other name like “V” for “minute”. YMDHVS are not conflicting.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 13:09

@PeterTKY 's comment brings us back to the specification of recurrences, which is one case of "duration addition".

There are actually 2 use cases for duration addition:

  1. In time interval specification (a [timeInterval]), the representation of [start]/[duration] is allowed. This means that duration addition is necessary.

  2. In recurrences, R/[timeInterval]/[eligible-time-intervals][selection], the calculation of recurring events again depend on the [timeInterval] and therefore [duration] is necessary.

Here are some questions for moving ahead:

  1. Do people know what they want when they write "P1M1D"?

  2. Do we want people to specify evaluation strategy? I think it's too complex for people to understand.

For just specifying a time interval, there's only one instance. If the user knows what the end date is, they might as well specify [start]/[end]. The only reason why someone will specify [start]/[duration] is for convenience.

For specifying a recurrence series, it only works if there is a UI where people can test out the recurring instances and see what the different cases mean.

  1. The order of evaluation really depends on the view on whether this is a math problem.

In 123 + 15, lower order digits are added before the higher order ones (with overflow addressed at each digit). This is the opposite from left to right adding.

  1. From @dmfs 's example: (2018-02-21 + P3M) + P10D != (2018-02-21 + P10D) + P3M

Clearly the duration addition function cannot be associative.

However, for statement equivalence, we could still have it if we enforce precedence. For example if we enforce in any calculation that the higher order is to be calculated first, they will be the same:

2022-02-29 + P1Y + P3M + P2D => (((2022-02-29 + P1Y) + P3M) + P2D

If we then define 2022-02-29 + P1Y3M2D => (((2022-02-29 + P1Y) + P3M) + P2D , then they are effectively the same.

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 7, 2018, 10:08

Created by: atlauren

I return to left-to-right evaluation. IMO, 2023-05-30 is the correct answer. I see this expansion as:

(((2022-02-29 + P1Y) + P3M) + P2D)

2022-02-29 + P1Y => 2023-02-29 <truncate> => 2023-02-28 2023-03-28 + P3M => 2023-05-28 2023-05-28 + P2D = 2023-05-30

Is it out of bounds to include guidance for the use of selections to declare actual intent? If you want to land on the last day of the month, fifteen months hence, [P] won't get you there reliably.

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 8, 2018, 07:45

Created by: atlauren

(prior post edited for occurrence across a leap day)

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 7, 2018, 09:04

Created by: atlauren

In @dmfs's example, I'd expect 2018-03-02. You're adding a month to the previously defined month, arriving at its last day and then adding two more.

As a note of perspective, as I am not a trained CS developer. I write documents and code/scripts with an intent toward readability. I also have a bias toward regex-like structures, so left-to-right ordering is significant, and things out of bounds are readily ignored.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 9, 2018, 01:59

I agree with @atlauren that the user intent is already clearly expressed (in many cases) in the duration expression.

So here's the proposal: we let the user/author of the rule decide.

Offer an "ordered duration representation" (still with prefix ["P"]) that allows ordering of the time scale components. e.g. P3M1Y vs normally P1Y3M.

  1. This order will specify the precedence order of calculation. Thus the original duration representation is always calculated with "higher order to lower order".
  2. This representation requires all intermediate results to be valid, because it is a desired property for "date + PAB == (date + PA) + PB".

2a. A lower order time scale value change will trigger overflows (i.e. day to month, month to year, etc.) 2b. A higher order time scale value change, if causing a lower time scale value to become invalid, truncation is applied instead (not overflow!) (normal year => leap year, truncate day; months, truncate day)

  1. Since the "clock" time scale components (hour, min, sec) can now precede the "calendar" components (year, month, day), the "T" prefix needs to be applied individually. Each "clock" designator symbol like H, M and S need the "T" applied to itself, such that they become "TH", "TM", "TS". For example, P3TM3TH2D2M.

  2. For this representation, we can also offer the designator symbol "V" as a shorthand to "TM", since "TM" is the only ambiguous symbol amongst YMDHMS. If there is no ambiguity, the prefix "T" does not need to be applied.

@atlauren @dmfs can you confirm which items here are (not) acceptable? Need to finalize this weekend. Thanks!

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 8, 2018, 07:18

Created by: atlauren

Suppose the specification requires the delimiter with each interval? If instead of P1Y3M2D it must be P1YP3MP2D, does that make it easier to determine evaluation order?

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 19:27

I had the same idea but it doesn't work that well with times, i.e. this is not valid:

PT24H1D
calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 9, 2018, 06:23

Created by: atlauren

I agree with letting the stated order dictate the order of evaluation; likewise agree with (2a) and (2b).

Regarding (3), would the explicit declaration of "T" for all clock components be required in all durations/selections, or just when employing lower-to-higher ordering? (Being in general in favor of consistency and rigor, I favor the former.)

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 8, 2018, 07:33

Created by: atlauren

I think @ronaldtse is spot-on here:

The order of evaluation really depends on the view on whether this is a math problem.

Computers are math engines, but humans don't regard calendar data in math terms. This is the core of the historical disconnect in calendaring complexity.

Regarding user's intent, I can certainly imagine use cases where they want "one year and one day", perhaps for legal reasons. Regardless of leap days, If a contract expires "one year and one day" after 2019-03-01, the logical expiration date is 2020-03-02. While the barrister likely does not see the syntactical details of P1Y1D, their intent is clear in their mind.

Because the data/time format is higher order to lower order, IMO it follows that the duration ordering should be likewise enforced as higher-to-lower, as well as the evaluation thereof.

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 7, 2018, 09:17

So what if I do

2022-02-29 + P1Y3M2D

would this result in

2022-02-29 + P1Y3M2D
=> (2022-02-29 + P1Y)  + P3M2D
=> (2023-02-29)  + P3M2D
=> (2023-02-29 + P3M) + P2D
=> (2023-05-29) + P2D
=> 2023-05-31

or

2022-02-29 + P1Y3M2D
=> (2022-02-29 + P1Y)  + P3M2D
=> (2023-02-29)  + P3M2D
=truncate> (2023-02-28) + P3M2D
=> (2023-02-28 + P3M) + P2D
=> (2023-05-28) + P2D
=> 2023-05-30

In this case I'd expect the former, i.e. 2023-05-31

This also raises the question if

2022-02-29 + P1Y3M2D == 2022-02-29 + P1Y + P3M + P2D

i.e. if I add each component individually, do I get the same result? Or is this even necessary/desirable?

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 9, 2018, 15:36

Thanks @atlauren .

(3) The problem is the ["P"] duration syntax is original to ISO 8601, so we can't enforce the ["T"][time] for all durations, but we can say that for any duration specified not in the ISO 8601-1 order (YMD["T"]HMS), the "T" prefix must be applied to the clock components.

(4) Probably we don't need this for now...

calconnect-ci commented 4 years ago

In GitLab by @dmfs on Sep 10, 2018, 05:56

I mostly agree with the proposal. However, I still don't like that something like

2022-02-29 + P1Y3M2D

would (counter-intuitively) result in 2023-05-30 instead of 2023-05-31.

So how about this idea:

The concatenation of two or more durations is a valid duration itself and each part is applied separately from left to right. When doing the math for one part, invalid intermediate results are allowed and any overflow is handled afterwards as per (2a) and (2b). So the result after applying a "sub-duration" is always valid.

In order to apply the rules above you'd just concatenate durations. So X + PAPB would be the same as (X + PA) + PB. But X + PAB might be different. (3) and (4) wouldn't be necessary then.

This would also allow something like this P1M-P1D i.e. add a month short of a day (I presume -P1D is still a valid Duration as in RFC 5545).

Examples:

2022-02-29 + P1Y3M2D
=> (2022-02-29 + P1Y) +P3M2D
=> (2023-02-29 + P3M) + P2D // intermediate result not a valid date, but not truncated either
=> 2023-05-29 + P2D
=> 2023-05-31

2022-02-29 + P1YP3MP2D
=> (2022-02-29 + P1Y) +P3MP2D
=> (2023-02-29 + P3M) + P2D
truncate => (2023-02-28 + P3M) + P2D
=> 2023-05-28 + P2D
=> 2023-05-30

2022-02-29 + P2DP3MP1Y // reverse order 
=> (2022-02-29 + P2D) +P3MP1Y
=> (2022-03-02 + P3M) + P1Y
=> 2022-06-02 + P1Y
=> 2023-06-02

I think this would be a consistent definition and allows users and applications to clearly express their intent.

calconnect-ci commented 4 years ago

In GitLab by @ribose-jeffreylau on Sep 10, 2018, 10:23

Created by: atlauren

I like this suggestion. It allows for user intent, flexibility, and rigor when desired. @ronaldtse, we should have examples displaying each variance of interpretation.

A difficult question that must be asked: do these varying scenarios/methods invite confusion? Are we in danger of a standard that does everything because decisions could not be made?

(Also, the next leap day is 2020-02-29, not 2022-02-29. But I think we all know what @dmfs meant. :D )

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 10, 2018, 09:52

@dmfs This is a great suggestion, it consolidates everything we've discussed.

I agree with this updated specification, it will allow the application to specify their intent. However, we will need to update the syntax to allow a "duration collection" (i.e. P...P...), and that methods 2 & 3 will only be available for new implementations that accept this syntax. (This has to be a new section.)

This means that the default for all normal purposes, i.e. as defined by ISO 8601:2004 and the new ISO 8601-1, the duration will be calculated using method 1 (accepts invalid intermediate results). Which I'm fine with. We also need to modify the Annex to explain the differences between evaluation order.

One issue @dmfs : -P1D is not valid under ISO 8601:2004. Will make this valid in the new syntax.

I'll do the updates today and reintegrate into ISO 8601-2.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 7, 2018, 19:20

I stand corrected; it's to indicate intent. We need a way to allow specification of accurate intent.

In fact, it may be possible to specify the calculation order using the duration syntax, such that:

To whether truncation is applied to intermediate results, we could also allow specifying the exact component where truncation should occur. Currently, the only two time scale components that can trigger truncation at a lower order time scale are Year (leap year, truncate day) and Month (truncate day).

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 10, 2018, 10:27

we should have examples displaying each variance of interpretation.

Indeed!

do these varying scenarios/methods invite confusion? Are we in danger of a standard that does everything because decisions could not be made?

There is a risk that the standard gives too many options, but the answer of the "intent" is still best answered by the user/specifier, given the current duration syntax doesn't allow full specification of intent. In any case, I'll put the examples in an informative annex so we won't cause anyone who has implemented the duration evaluation method differently to be "incompatible".

calconnect-ci commented 4 years ago

In GitLab by @PeterTKY on Sep 7, 2018, 10:39

I think there is no right or wrong for the choices we mentioned. All we have to do is to let the user who creates choose whatever choice suitable for him.

Taking the choices from @ronaldtse as an example:

Should 2018-01-29 + P1M2D end up as: 2018-02-28 2018-03-02 2018-03-03 ...other choices making sense...

To help user choose the right one for himself, we should list out the choices when he is creating such an instance. And we have to standardize the possible choices, perhaps by naming/labling them properly:

2018-02-28 (name=truncate_after_all_computation, label=TAAC) 2018-03-02 (name=truncate_after_month_computation, label=TAMC) 2018-03-03 (name=no_truncation, label=NT)

Then the syntax will become:

Therefore we don't have to come up with a default behaviour in this case. All will be handled by client application. Client application could either have its own default behaviour, or let the use choose one of those.

calconnect-ci commented 4 years ago

In GitLab by @PeterTKY on Sep 7, 2018, 16:55

I have some further questions to @ronaldtse's:

  1. Do people know what they want when they write "P1M1D"?
  2. Do we want people to specify evaluation strategy? I think it's too complex for people to understand.

For 1, my question is in what user interface, the user will be able to write "P1M1D"? I think normally a user would pick it from a grid calendar or something similar. We will then have the actual end date. Does that mean "P1M1D" is actually a result of computation?

If a user is really able to write "P1M1D" himself, then the default evaluation strategy depends on how human beings interpret "P1M1D". But we all know it's ambiguous (otherwise we wouldn't have the discussion happening right here). That's why I think we should pass the ball back to the user.

For 2, I think we should provide explicit date choices for people, and those choices imply the corresponding evaluation strategy.

However, if the user has chosen a specific date, what's the purpose of keeping "P1M1D" but not just having the date? This will be a different case from recurrence because we don't have to compute for future instances in the case of duration.

calconnect-ci commented 4 years ago

In GitLab by @ronaldtse on Sep 11, 2018, 05:49

It's all addressed now. Let me know if it works. Closing for now.