dateutil / dateutil

Useful extensions to the standard Python datetime features
Other
2.37k stars 489 forks source link

Parse support for timezone offset with optional seconds required #1081

Open gbhat618 opened 4 years ago

gbhat618 commented 4 years ago

Timezone offset may contain an optional seconds part in them, a hh:mm:ss is a valid offset. Although I didn't find that in the ISO standard for datetime, it appears to have a mention in the POSIX standard while discussing the TZ environment variable. here is the link to it. Also, the python standard library datetime.datetime.strftime supports it.

The timezone offset shown in the example, Asia/Kolkata == +05:21:10 I took it from the Date-Manip.

python standard library,

>>> from datetime import datetime
>>> datetime.strptime('2020-09-01T01:01:01+05:21:10', '%Y-%m-%dT%H:%M:%S%z')
datetime.datetime(2020, 9, 1, 1, 1, 1, tzinfo=datetime.timezone(datetime.timedelta(seconds=19270)))

dateutil parser,

>>> import dateutil.parser
>>> dateutil.parser.parse('2020-09-01T01:01:01+05:21:10')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 2020-09-01T01:01:01+05:21:10

This was noticed while using the pendulum library for parsing. More specifically pendulum.parse. Looking more closely at the pendulum led me to the dateutil. It appears that the issue is on dateutil side. Also as the python standard library itself is supporting the format, it might be good to provide support in compliance with it.

cmiiw... Thanks!

gbhat618 commented 4 years ago

The standard library documentation says, python calls the native C library function strftime and all the format codes supported by the C function is supported in python as well.

Which would require to support for %z as UTC offset in the form ±HHMM[SS[.ffffff]] (empty string if the object is naive). link here

ffe4 commented 4 years ago

There is also an issue for supporting second offsets in the isoparser #572 , so I think it is definitely something that should be supported by the normal parser.

gbhat618 commented 4 years ago

Okk, that is good to know.

CMIW, timezone offset is nothing but timedelta, that means it can even go for microsecond level. But definitely such timezone do not exist on this planet.

Well I must admit, I hadn't checked while pointing out the python standard library supports HH:MM:SS in timezone offset, It took me by surprise,

Python3.6 doesn't support on while checked on Ubuntu 18.04

Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> datetime.strptime('2020-09-01T01:01:01+05:21:10', '%Y-%m-%dT%H:%M:%S%z')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/_strptime.py", line 565, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "/usr/lib/python3.6/_strptime.py", line 362, in _strptime
    (data_string, format))
ValueError: time data '2020-09-01T01:01:01+05:21:10' does not match format '%Y-%m-%dT%H:%M:%S%z'
>>> 

However works on Python 3.7.5

Skeen commented 2 years ago

I have a similar issue, specifically with this:

>>> from zoneinfo import ZoneInfo
>>> from datetime import datetime
>>> datetime(1939, 1, 1, tzinfo=ZoneInfo('Asia/Aden')).isoformat()
'1939-01-01T00:00:00+03:06:52'

Which yields:

>>> isoparse('1939-01-01T00:00:00+03:06:52')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/dateutil/parser/isoparser.py", line 37, in func
    return f(self, str_in, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/dateutil/parser/isoparser.py", line 138, in isoparse
    components += self._parse_isotime(dt_str[pos + 1:])
  File "/usr/lib/python3/dist-packages/dateutil/parser/isoparser.py", line 346, in _parse_isotime
    components[-1] = self._parse_tzstr(timestr[pos:])
  File "/usr/lib/python3/dist-packages/dateutil/parser/isoparser.py", line 383, in _parse_tzstr
    raise ValueError('Time zone offset must be 1, 3, 5 or 6 characters')
ValueError: Time zone offset must be 1, 3, 5 or 6 characters