97jaz / gregor

Date and time library for Racket
45 stars 10 forks source link

period arithmetic? #46

Open sorawee opened 4 years ago

sorawee commented 4 years ago

Is it possible to add period arithmetic? For example, we should be able to add two periods together, or perhaps scale one period by a real number.

97jaz commented 4 years ago

@sorawee You can add periods already:

#lang racket/base

(require gregor
         gregor/period)

(define p1 (months 2))
(define p2 (seconds 4))

(+period p1 p2)
(-period p1 p2)

There's actually an open issue regardingperiod-scale.

97jaz commented 4 years ago

Oh, but you wouldn't be able to scale a period by an arbitrary real number. It would need to be an exact integer. (Or, at least, I don't see how I could make it work for any real?.)

sorawee commented 4 years ago

A more general operation (I think) is to scale by an arbitrary real number, but rounds the result to the closest representable period. Would that be possible?

97jaz commented 4 years ago

If you stipulate a meaning of "closest," then yes, but it sounds like a pretty confusing interface to me. Do you have a motivating example?

sorawee commented 4 years ago

Design-wise, I wish that period will support all timedelta operations. According to the doc:

Delta divided by a float or an int. The result is rounded to the nearest multiple of timedelta.resolution using round-half-to-even.

where timedelta.resolution is 1ms.

I don't recall off the top of my head what I was trying to do, but I do recall doing time interval halving a lot when I wrote Python.

97jaz commented 4 years ago

Thank you, that's very helpful.

So, first off, there's one huge difference between a period and a timedelta: every field of the latter is convertible into an exact number of microseconds, whereas the former does not represent a particular quantum of time.

For example, how many seconds are in (months 1)? The question doesn't make sense (unless you stipulate how many days are in a month).

Time periods (i.e., periods that satisfy time-period?), on the other hand, are similar to timedeltas in that they can all be reduced to some integral number of nanoseconds. However, since time-period? is really just a restriction of period? and the representation is the same, it's kind of limited by what you can do with periods in general.

I've never been very happy with the period API. It definitely has some design mistakes and infelicities. Maybe I should have provided different representations for:

I did consider this and then didn't do it. Unfortunately, I can't remember why I decided against it.

97jaz commented 4 years ago

In my unpublished, forever-in-progress, rarely-worked-on datetime-lib, I expose the following interface, which I think is definitely an improvement, but still subject to any kind of revision:

;;;;
;; Exports
(provide/contract
 [period?                  (-> any/c boolean?)]
 [period-empty?            (-> period? boolean?)]
 [date-period?             (-> period? boolean?)]
 [time-period?             (-> period? boolean?)]
 [period-negate            (-> period? period?)]
 [period-scale             (-> period? exact-integer? period?)]

 [period->date-period      (-> period? date-period?)]
 [period->time-period      (-> period? time-period?)]
 [time-period->nanoseconds (-> time-period? exact-integer?)]

 [empty-period             (and/c date-period? time-period? period?)]

 [period-ref               (-> period? temporal-unit/c exact-integer?)]
 [period-set               (-> period? temporal-unit/c exact-integer? period?)]

 [period-add-years         (-> period? exact-integer? period?)]
 [period-add-months        (-> period? exact-integer? period?)]
 [period-add-weeks         (-> period? exact-integer? period?)]
 [period-add-days          (-> period? exact-integer? period?)]
 [period-add-date-period   (-> period? date-period? period?)]

 [period-add-hours         (-> period? exact-integer? period?)]
 [period-add-minutes       (-> period? exact-integer? period?)]
 [period-add-seconds       (-> period? exact-integer? period?)]
 [period-add-milliseconds  (-> period? exact-integer? period?)]
 [period-add-microseconds  (-> period? exact-integer? period?)]
 [period-add-nanoseconds   (-> period? exact-integer? period?)]
 [period-add-time-period   (-> period? time-period? period?)]

 [time-period-normalize    (->* (time-period?) ((listof temporal-unit/c)) time-period?)]

 [period->list             (->* (period?)
                                ((listof temporal-unit/c))
                                (listof (cons/c temporal-unit/c exact-integer?)))]
 [list->period             (-> (listof (cons/c temporal-unit/c exact-integer?)) period?)]

 [years                    (-> exact-integer? date-period?)]
 [months                   (-> exact-integer? date-period?)]
 [weeks                    (-> exact-integer? date-period?)]
 [days                     (-> exact-integer? date-period?)]

 [hours                    (-> exact-integer? time-period?)]
 [minutes                  (-> exact-integer? time-period?)]
 [seconds                  (-> exact-integer? time-period?)]
 [milliseconds             (-> exact-integer? time-period?)]
 [microseconds             (-> exact-integer? time-period?)]
 [nanoseconds              (-> exact-integer? time-period?)]

 [date-units               (listof symbol?)]
 [time-units               (listof symbol?)]
 [temporal-units           (listof symbol?)]
 [date-unit/c              flat-contract?]
 [time-unit/c              flat-contract?]
 [temporal-unit/c          flat-contract?])

period->date-period and period->time-period are truncating operations, by the way.

97jaz commented 4 years ago

The fact that timedelta has days and weeks as fields is interesting, since a "day" means something different in date arithmetic vs. time arithmetic. In the latter, it's exactly 86400 seconds (assuming you're ignoring leap seconds), whereas in the former it could be shorter or longer on a local timeline, depending on the zone.

Reading the timedelta docs, it's not clear to me that there is a rigorous distinction in that library between date and time arithmetic. Maybe there is, but the docs don't seem to be clear about this. I'm kind of getting the impression that after a timedelta is normalized to days, seconds, and microseconds, then when you do datelike_thing + delta, the days field is added using date arithmetic rules, and the other fields use time arithmetic rules. Which would be fine, except that the delta object itself will convert seconds to days, which only makes sense in a time arithmetic context. So, if I'm right about how this works, I really don't like it and do not want to emulate it. Should be easy to test...

And yep, that's right. So:

>>> from datetime import datetime, timedelta
>>> from pytz import timezone
>>> tz = timezone("America/New_York")
>>> dt = datetime(2020, 10, 31, 2, 30)
>>> dt1 = tz.localize(dt)
>>> dt1
datetime.datetime(2020, 10, 31, 2, 30, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)
>>> dt1 + timedelta(days=1)
datetime.datetime(2020, 11, 1, 2, 30, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)
>>> dt1 + timedelta(seconds=86400)
datetime.datetime(2020, 11, 1, 2, 30, tzinfo=<DstTzInfo 'America/New_York' EDT-1 day, 20:00:00 DST>)

But that's wrong. There absolutely are not 86400 seconds between those two times. Gregor gets this right by understanding the distinction between a day, considered from the standpoint of date arithmetic, and a period of 86400 seconds:

> (require gregor gregor/period)
> (define m (moment 2020 10 31 2 30 #:tz "America/New_York"))
> (+period m (days 1))
#<moment 2020-11-01T02:30:00-05:00[America/New_York]>
> (+period m (seconds 86400))
#<moment 2020-11-01T01:30:00-05:00[America/New_York]>
97jaz commented 4 years ago

Actually, looking closer at what the python repl is telling me, it's even worse. The UTC offset isn't being adjusted as we cross a DST boundary:

>>> (dt1 + timedelta(days=1)).isoformat()
'2020-11-01T02:30:00-04:00'
>>> (dt1 + timedelta(seconds=86400)).isoformat()
'2020-11-01T02:30:00-04:00'

These are both wrong. Unless I've managed to confuse myself, there is no 2:30 at UTC-04:00 on that day; instead of hitting 02:00 at -04:00, we go to 01:00 at -05:00.

I realize that some of this criticism is more directed at the pytz lib than at the datetime lib, but I still think that the latter's decision to promote seconds to days automatically in a timedelta, yet treat the fields differently in arithmetic, is a bad move.

[Edit] Eh... well...

So, if there weren't a DST boundary there, then the results would both be right, so without knowing more about these libraries, I can't confidently say either that:

sorawee commented 4 years ago

I concur that converting seconds to days is problematic, so we should not replicate the entire timedelta behavior.

What this feature request should really be titled is "time period arithmetic?"

However, since time-period? is really just a restriction of period? and the representation is the same, it's kind of limited by what you can do with periods in general.

What is the restriction? Perhaps I miss something, but I don't see why it's impossible to write:

(define/contract (scale-time-period p c) (time-period? real? . -> . time-period?)
  ;; convert p to nanoseconds, then multiply be c, then round the result
  ;; a sensible whole nanoseconds, and normalize the result to a time-period?
  ...)
97jaz commented 4 years ago

Oh, it's definitely possible, and I understand why you'd want that. Your example of wanting to halve time periods is compelling. But what this discussion has wound up convincing me of (well, at least for now) is that trying to carve out a subspecies of period to represent a specific quantity of time wasn't a very good idea. I wish I had created a separate duration instead of time-period.

Here's an example of what I mean. You wrote:

  ;; convert p to nanoseconds, then multiply be c, then round the result
  ;; a sensible whole nanoseconds, and normalize the result to a time-period?

And I can see why you'd want to do it this way (I mean: convert to ns, scale, then normalize), but it runs counter to the general way that periods work, which is that they don't automatically normalize anything but instead preserve the user's choice of fields. So it would be more in keeping with current behavior to scale each time field separately, but that would give a lower precision answer, and no one would want that.

That's how I know I made a design mistake: both options seem wrong.

Considering that gregor hasn't actually committed very far to the concept of the time-period as a particular quantity of time (e.g., unlike the dev version of datetime-lib, gregor doesn't provide a time-period->nanoseconds function), I think it would still be possible to create a separate datatype for this purpose. I'm... pondering this.


It's probably worth noting that your scale-time-period can be written even with the current public API (warning -- untested software ahead):

#lang racket/base

(require gregor
         gregor/period
         racket/contract/base
         racket/math)

(define units+ns-per
  (list (cons 'seconds (expt 10 9))
        (cons 'milliseconds (expt 10 6))
        (cons 'microseconds (expt 10 3))
        (cons 'nanoseconds (expt 10 0))))

(define (time-period->nanoseconds tp)
  (for/sum ([pair (in-list units+ns-per)])
    (* (period-ref tp (car pair))
       (cdr pair))))

(define (nanoseconds->normalized-time-period ns)
  (for/fold ([result empty-period]
             [remaining ns]
             #:result result)
            ([pair (in-list units+ns-per)])
    (values (period-set result (car pair) (quotient remaining (cdr pair)))
            (remainder remaining (cdr pair)))))

(define (scale-time-period tp c #:round-proc [round-proc exact-round])
  (define ns0 (time-period->nanoseconds tp))
  (define ns (round-proc (* ns0 c)))
  (nanoseconds->normalized-time-period ns))

(provide/contract
 [time-period->nanoseconds (-> time-period? exact-integer?)]
 [nanoseconds->normalized-time-period (-> exact-integer? time-period?)]
 [scale-time-period (->i ([tp time-period?]
                          [c real?])
                         (#:round-proc [round-proc (-> rational? exact-integer?)])
                         [result time-period?])])
97jaz commented 4 years ago

Just one note on the above code: I think we'd want the contract for scale-time-period to use a rational? instead of a real?, since we can't handle infinite dates/times. (And some experience with the matter suggests that infinite dates/times are a bad idea, and what we want instead are proper date/time intervals, which can be unbounded on either side.)

jackfirth commented 4 years ago

@97jaz Internally at my workplace there's a very good document about datetime APIs called How To Think About Time, and one of the main things it recommends is separating physical time ("how many seconds does it take for a pound of Caesium-137 to decay?") from civil time ("how long was the month of December 1927 in Shanghai?"). Periods in Gregor are used for both, which is why (months 1) is a confusing period. Having separate APIs for working with physical time and with civil time and helping users figure out which to use would be great.