arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.73k stars 683 forks source link

.to() fails with large dates due to dateutil timestamp overflow #991

Open matthuisman opened 3 years ago

matthuisman commented 3 years ago

Issue Description

(I know the issue is actually inside dateutil, but dateutil isn't written to support larger timestamps like arrow is)

64bit

import arrow

date = arrow.get('3100-01-01T07:00:00Z')
print(date)
print(date.to('local'))
Traceback (most recent call last):
  File "C:\Users\Matt\Desktop\test2.py", line 4, in <module>
    print(date.to("local"))
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\arrow\arrow.py", line 1076, in to
    dt = self._datetime.astimezone(tz)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\_common.py", line 144, in fromutc
    return f(self, dt)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\_common.py", line 258, in fromutc
    dt_wall = self._fromutc(dt)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\_common.py", line 222, in _fromutc
    dtoff = dt.utcoffset()
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\tz.py", line 222, in utcoffset
    if self._isdst(dt):
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\tz.py", line 291, in _isdst
    dstval = self._naive_is_dst(dt)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\tz.py", line 260, in _naive_is_dst
    return time.localtime(timestamp + time.timezone).tm_isdst
OSError: [Errno 22] Invalid argument

32bit (user with a Raspberry Pi 3B+ initially reported this to me - 2050 isn't that far away)

import arrow

date = arrow.get('2050-01-01T07:00:00Z')
print(date)
print(date.to('local'))
File "/home/osmc/.kodi/addons/slyguy.disney.plus/resources/lib/plugin.py", line 484, in _parse_video
available = available.to('local')
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/arrow/arrow.py", line 722, in to
dt = self._datetime.astimezone(tz)
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/dateutil/tz/_common.py", line 144, in fromutc
return f(self, dt)
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/dateutil/tz/_common.py", line 258, in fromutc
dt_wall = self._fromutc(dt)
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/dateutil/tz/_common.py", line 222, in _fromutc
dtoff = dt.utcoffset()
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/dateutil/tz/tz.py", line 222, in utcoffset
if self._isdst(dt):
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/dateutil/tz/tz.py", line 291, in _isdst
dstval = self._naive_is_dst(dt)
File "/home/osmc/.kodi/addons/script.module.slyguy/resources/modules/dateutil/tz/tz.py", line 260, in _naive_is_dst
return time.localtime(timestamp + time.timezone).tm_isdst
ValueError: timestamp out of range for platform time_t

Due to the timestamp being too large here: https://github.com/dateutil/dateutil/blob/master/dateutil/tz/tz.py#L259

If you hack _naive_is_dst to something like below

def _naive_is_dst(self, dt):
    timestamp = _datetime_to_timestamp(dt) + time.timezone

    MAX_TIMESTAMP = 32503719599.0
    MAX_TIMESTAMP_MS = MAX_TIMESTAMP * 1000
    MAX_TIMESTAMP_US = MAX_TIMESTAMP * 1000000

    if timestamp > MAX_TIMESTAMP:
        if timestamp < MAX_TIMESTAMP_MS:
            timestamp /= 1e3
        elif timestamp < MAX_TIMESTAMP_US:
            timestamp /= 1e6

    return time.localtime(timestamp).tm_isdst

it works as intended. So maybe arrow needs to do it's own astimezone() so it can use it's normalise timestamp function.

For my workaround, I simply updated the dateutil code to use arrows normalize_timestamp https://github.com/matthuisman/slyguy.addons/commit/cccc92a818ed70f30f951b15acfbf91622731c75

System Info

anishnya commented 3 years ago

Thanks @matthuisman for the report. @jadchaar, @krisfremen and I will take a look at this soon and see what we can do.

jadchaar commented 3 years ago

I am unable to reproduce this on my 64-bit macOS machine, so I think it is limited to Linux and windows. I think this is where the exception is triggered (our to() wrapper calls the Arrow constructor):

https://github.com/arrow-py/arrow/blob/7c9632c09161b1edb67fadb4bf8f3c1c0f5cb101/arrow/arrow.py#L176-L178

Therefore, we may be able to wrap this in a try/except and in the except block, get the timestamp, normalize it (using normalize_timestamp), and then extract an arrow object from that.

Are you able to reproduce this on your end @anishnya? We should probably get this reproduced either on a local machine or on the CI builds so we can ensure our patch works as expected once we attempt a fix.

matthuisman commented 3 years ago
C:\Users\Matt\Desktop>py -3 test.py
3100-01-01T07:00:00+00:00
Traceback (most recent call last):
  File "C:\Users\Matt\Desktop\test.py", line 4, in <module>
    print(date.to('local'))
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\arrow\arrow.py", line 1076, in to
    dt = self._datetime.astimezone(tz)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\_common.py", line 144, in fromutc
    return f(self, dt)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\_common.py", line 258, in fromutc
    dt_wall = self._fromutc(dt)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\_common.py", line 222, in _fromutc
    dtoff = dt.utcoffset()
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\tz.py", line 222, in utcoffset
    if self._isdst(dt):
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\tz.py", line 291, in _isdst
    dstval = self._naive_is_dst(dt)
  File "C:\Users\Matt\AppData\Local\Programs\Python\Python39\lib\site-packages\dateutil\tz\tz.py", line 260, in _naive_is_dst
    return time.localtime(timestamp + time.timezone).tm_isdst
OSError: [Errno 22] Invalid argument

The first print will print fine - it's just the "to" that fails. The line it fails on is dt = self._datetime.astimezone(tz) I'm pretty confident the issue is inside dateutil so maybe arrow will need to create it's own astimezone using normalize_timestamp? Or possibly dateutil will fix eventually?

anishnya commented 3 years ago

@matthuisman dateutil has seen some recent work on it, but there was about a year stretch where there were zero commits to dateutil. I'm not too sure how likely it is dateutil will fix this issue and if they do fix it, when that fix will be merged in. We've had some internal discussions (well before this issue came up) about dropping the dateutil dependency from Arrow because of the inconsistency of dateutil's maintenance, but we haven't made a decision yet.

anishnya commented 3 years ago

I am unable to reproduce this on my 64-bit macOS machine, so I think it is limited to Linux and windows. I think this is where the exception is triggered (our to() wrapper calls the Arrow constructor):

https://github.com/arrow-py/arrow/blob/7c9632c09161b1edb67fadb4bf8f3c1c0f5cb101/arrow/arrow.py#L176-L178

Therefore, we may be able to wrap this in a try/except and in the except block, get the timestamp, normalize it (using normalize_timestamp), and then extract an arrow object from that.

Are you able to reproduce this on your end @anishnya? We should probably get this reproduced either on a local machine or on the CI builds so we can ensure our patch works as expected once we attempt a fix.

I've tried on my local machine as well (macOS 64bit) and haven't been able to reproduce this issue as well. I'll try on a Windows machine as well and provide a future update.

matthuisman commented 3 years ago

I think the issue is the same timestamp bug that requires the below lines in constants.py: https://github.com/arrow-py/arrow/blob/master/arrow/constants.py#L16

You should be able to reproduce on mac with something like the below

import arrow
date = arrow.get(arrow.constants.MAX_TIMESTAMP).shift(years=100)
print(date)
print(date.to("local"))
jadchaar commented 3 years ago

Finding the max timestamp in a platform agnostic manner has proven to be very difficult (no standard ways to do this in the Python standard library or in dateutil). So the MAX_TIMESTAMP is used as a rought guide for when to compute the normalized timestamp, but it is imperfect because datetime seems to have trouble with its own max timestamp (which is what we use in constants.py):

>>> datetime.fromtimestamp(datetime.max.timestamp())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: year 0 is out of range
systemcatch commented 3 years ago

Can't reproduce on my machine.

>>> import arrow
>>> date = arrow.get('3100-01-01T07:00:00Z')
>>> date
<Arrow [3100-01-01T07:00:00+00:00]>
>>> print(date.to('local'))
3100-01-01T07:00:00+00:00
>>> import platform
>>> platform.uname()
uname_result(system='Linux', node='Z490', release='5.8.0-63-generic', version='#71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021', machine='x86_64')
matthuisman commented 3 years ago

how about this

import arrow
date = arrow.get('1966-08-24T00:00:00Z')
print(date)
print(date.to('local'))

or

import arrow
date = arrow.get(arrow.constants.MAX_TIMESTAMP).shift(years=100)
print(date)
print(date.to("local"))