arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.71k stars 673 forks source link

Range() doesn't work properly on twenty-five hour days #871

Closed msarrel closed 3 years ago

msarrel commented 3 years ago

Issue Description

The range() method does not work properly on days with 25 hours in them when we switch from daylight savings time to standard time.

I would expect this fragment of code to produce a list that has twenty-five elements.

import arrow

long1 = list(arrow.Arrow.range(
    "hour",
    arrow.get("2001-10-28 00:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific"),
    arrow.get("2001-10-28 23:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific")))
print("long")
print(len(long1))
for i in range(7):
    print(long1[i], long1[i].to("utc"))

But, the list contains only twenty-four entries. If we look at the first few entries, we can see that 2001-10-28T02:00:00-07:00, corresponding to 2001-10-28T09:00:00+00:00, is missing.

long
24
2001-10-28T00:00:00-07:00 2001-10-28T07:00:00+00:00
2001-10-28T01:00:00-07:00 2001-10-28T08:00:00+00:00
2001-10-28T02:00:00-08:00 2001-10-28T10:00:00+00:00
2001-10-28T03:00:00-08:00 2001-10-28T11:00:00+00:00
2001-10-28T04:00:00-08:00 2001-10-28T12:00:00+00:00
2001-10-28T05:00:00-08:00 2001-10-28T13:00:00+00:00
2001-10-28T06:00:00-08:00 2001-10-28T14:00:00+00:00

This fragment of code shows the expected result.

import arrow

long2 = [arrow.get("2001-10-28T00:00:00-07:00"), arrow.get("2001-10-28T01:00:00-07:00"),
         arrow.get("2001-10-28T02:00:00-07:00"), arrow.get("2001-10-28T02:00:00-08:00"),
         arrow.get("2001-10-28T03:00:00-08:00"), arrow.get("2001-10-28T04:00:00-08:00"),
         arrow.get("2001-10-28T05:00:00-08:00"), arrow.get("2001-10-28T06:00:00-08:00"),
         arrow.get("2001-10-28T07:00:00-08:00"), arrow.get("2001-10-28T08:00:00-08:00"),
         arrow.get("2001-10-28T09:00:00-08:00"), arrow.get("2001-10-28T10:00:00-08:00"),
         arrow.get("2001-10-28T11:00:00-08:00"), arrow.get("2001-10-28T12:00:00-08:00"),
         arrow.get("2001-10-28T13:00:00-08:00"), arrow.get("2001-10-28T14:00:00-08:00"),
         arrow.get("2001-10-28T15:00:00-08:00"), arrow.get("2001-10-28T16:00:00-08:00"),
         arrow.get("2001-10-28T17:00:00-08:00"), arrow.get("2001-10-28T18:00:00-08:00"),
         arrow.get("2001-10-28T19:00:00-08:00"), arrow.get("2001-10-28T20:00:00-08:00"),
         arrow.get("2001-10-28T21:00:00-08:00"), arrow.get("2001-10-28T22:00:00-08:00"),
         arrow.get("2001-10-28T23:00:00-08:00")]
print("expected long")
print(len(long2))
for i in range(7):
    print(long2[i], long2[i].to("utc"))

And, now we see 2001-10-28T02:00:00-07:00 and 2001-10-28T09:00:00+00:00.

expected long
25
2001-10-28T00:00:00-07:00 2001-10-28T07:00:00+00:00
2001-10-28T01:00:00-07:00 2001-10-28T08:00:00+00:00
2001-10-28T02:00:00-07:00 2001-10-28T09:00:00+00:00
2001-10-28T02:00:00-08:00 2001-10-28T10:00:00+00:00
2001-10-28T03:00:00-08:00 2001-10-28T11:00:00+00:00
2001-10-28T04:00:00-08:00 2001-10-28T12:00:00+00:00
2001-10-28T05:00:00-08:00 2001-10-28T13:00:00+00:00

My suggested solution would be to convert the beginning and end of the range to UTC, perform the range, and then convert the results back to the original time zone.

There is still a problem in that arrow.get("2001-10-28 02:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific").to("utc") always produces the result <Arrow [2001-10-28T10:00:00+00:00]>. It could just as legitimately produce <Arrow [2001-10-28T09:00:00+00:00]>. Not sure what to suggest, but would be good to give the user control over the result in this sort of case. Perhaps range() could optionally return a tuple of times in this sort of case, or the user could specify which result is desired.

System Info

systemcatch commented 3 years ago

Hi @msarrel thanks for the bug report.

arrow has now implemented PEP 495 for all tzinfos that it uses. This allows us to work with ambiguous (same clock, different offset) datetimes.

With your example;

(arrow) chris@ThinkPad:~/arrow$ python
Python 3.8.3 (default, Jul  7 2020, 18:57:36) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
>>> arw=arrow.get("2001-10-28 01:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific")
>>> arw
<Arrow [2001-10-28T01:00:00-07:00]>
>>> arw2=arw.replace(fold=1)
>>> arw2
<Arrow [2001-10-28T01:00:00-08:00]>
>>> arw==arw2
True
>>> arw2.to("utc")
<Arrow [2001-10-28T09:00:00+00:00]>

So the result for your range method is correct (mostly, depends on dateutil), however it would be nice to be able to pass fold as a kwarg to arrow.get().

msarrel commented 3 years ago

Thank you for telling me about the fold option. That is useful. But, I'm still not completely convinced on the correctness of the range() result. I think that range() should return a list of 25 values for 2001-10-28.

import arrow

short = list(arrow.Arrow.range(
    "hour",
    arrow.get("2001-04-01 00:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific"),
    arrow.get("2001-04-01 23:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific")))
print("short (should be 23)")
print(len(short))

normal = list(arrow.Arrow.range(
    "hour",
    arrow.get("2001-04-02 00:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific"),
    arrow.get("2001-04-02 23:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific")))
print("normal (should be 24)")
print(len(normal))

long = list(arrow.Arrow.range(
    "hour",
    arrow.get("2001-10-28 00:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific"),
    arrow.get("2001-10-28 23:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific")))
print("long (should be 25)")
print(len(long))

Currently, range() works correctly in two of the three cases. That is for short (23 hour) and normal (24 hour) days. I'd like it to work correctly for long (25 hour) days as well.

short (should be 23)
23
normal (should be 24)
24
long (should be 25)
24
msarrel commented 3 years ago

Another way to illustrate is this code:

long_utc = list(arrow.Arrow.range(
    "hour",
    arrow.get("2001-10-28 00:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific").to("utc"),
    arrow.get("2001-10-28 23:00:00", "YYYY-MM-DD HH:mm:ss", tzinfo="US/Pacific").to("utc")))
print("long_utc (should be 25)")
print(len(long_utc))
for t in long_utc:
    print(t.to("US/Pacific"), t)

It produces the expected 25 hour result by first converting to UTC, then performing the range and then converting back.

long_utc (should be 25)
25
2001-10-28T00:00:00-07:00 2001-10-28T07:00:00+00:00
2001-10-28T01:00:00-07:00 2001-10-28T08:00:00+00:00
2001-10-28T01:00:00-08:00 2001-10-28T09:00:00+00:00
2001-10-28T02:00:00-08:00 2001-10-28T10:00:00+00:00
2001-10-28T03:00:00-08:00 2001-10-28T11:00:00+00:00
2001-10-28T04:00:00-08:00 2001-10-28T12:00:00+00:00
2001-10-28T05:00:00-08:00 2001-10-28T13:00:00+00:00
2001-10-28T06:00:00-08:00 2001-10-28T14:00:00+00:00
2001-10-28T07:00:00-08:00 2001-10-28T15:00:00+00:00
2001-10-28T08:00:00-08:00 2001-10-28T16:00:00+00:00
2001-10-28T09:00:00-08:00 2001-10-28T17:00:00+00:00
2001-10-28T10:00:00-08:00 2001-10-28T18:00:00+00:00
2001-10-28T11:00:00-08:00 2001-10-28T19:00:00+00:00
2001-10-28T12:00:00-08:00 2001-10-28T20:00:00+00:00
2001-10-28T13:00:00-08:00 2001-10-28T21:00:00+00:00
2001-10-28T14:00:00-08:00 2001-10-28T22:00:00+00:00
2001-10-28T15:00:00-08:00 2001-10-28T23:00:00+00:00
2001-10-28T16:00:00-08:00 2001-10-29T00:00:00+00:00
2001-10-28T17:00:00-08:00 2001-10-29T01:00:00+00:00
2001-10-28T18:00:00-08:00 2001-10-29T02:00:00+00:00
2001-10-28T19:00:00-08:00 2001-10-29T03:00:00+00:00
2001-10-28T20:00:00-08:00 2001-10-29T04:00:00+00:00
2001-10-28T21:00:00-08:00 2001-10-29T05:00:00+00:00
2001-10-28T22:00:00-08:00 2001-10-29T06:00:00+00:00
2001-10-28T23:00:00-08:00 2001-10-29T07:00:00+00:00

And, it shows why it's tricky to do these conversions. In my original comment, I should have written that 2001-10-28T09:00:00+00:00 corresponds to 2001-10-28T01:00:00-08:00 rather than 2001-10-28T02:00:00-07:00. That was my manual mistake.

systemcatch commented 3 years ago

Yes there's plenty of room for slip ups with how complex this stuff can get. Given how python represents ambiguous datetimes it's not easy or necessary to implement this change.

However it's worth adding the fold kwarg to arrow.get().

systemcatch commented 3 years ago

Given that there's been no further discussion and we're not planning on making any changes here I'll close this.

msarrel commented 3 years ago

See https://github.com/arrow-py/arrow/issues/885

msarrel commented 3 years ago

https://github.com/arrow-py/arrow/issues/884