arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.71k stars 673 forks source link

Could not parse timezone expression "PDT" #835

Closed westhomas closed 4 years ago

westhomas commented 4 years ago

Issue Description

I'm sure I'm doing something wrong, but I only see three timezone parsing options available to me (ZZZ, ZZ, and Z) and none of them seem to work. I believe ZZZ is the correct one to use, but it complains.

What's the proper way of parsing the PDT timezone?

In [2]: arrow.get("Mon, 10 Aug 2020 22:00:12 -0700 (PDT)", "ddd, DD MMM YYYY HH:mm:ss Z (ZZZ)")

    576         if tzinfo is None:
    577             raise ParserError(
--> 578                 'Could not parse timezone expression "{}"'.format(tzinfo_string)
    579             )
    580

ParserError: Could not parse timezone expression "PDT"

FYI, this is a date that gmail's IMAP is handing me in my email messages.

System Info

macOS 10.15.5 (19F101) Python 3.7.6 arrow==0.15.8

jadchaar commented 4 years ago

Hmm this seems to be due to our dependency, dateutil, not being able to properly map PDT to an offset:

>>> from dateutil import tz
>>> tz.gettz("PDT")
None

Because gettz returns None, Arrow throws an error complaining that the timezone is invalid.

This seems to be related to https://github.com/dateutil/dateutil/issues/932 and https://github.com/dateutil/dateutil/issues/575, and the fundamental issue that PDT can be an ambiguous abbreviation. It seems like the only way around this is to manually map timezone abbreviations to offsets or do a regex/string replace on abbreviations to sub in the appropriate full timezone name.

Any other ideas @systemcatch @krisfremen.

systemcatch commented 4 years ago

Unless dateutil alters how these abbreviations are parsed (unlikely given past discussions) the only way round this is the fixes Jad mentioned.

Interestingly GMT works and will override a fixed offset.

(arrow) chris@ThinkPad:~/arrow$ python
Python 3.8.3 (default, Jul  7 2020, 18:57:36) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
>>> arrow.get("Mon, 10 Aug 2020 22:00:12 -0700 (GMT)", "ddd, DD MMM YYYY HH:mm:ss Z (ZZZ)")
<Arrow [2020-08-10T22:00:12+00:00]>

We should update the docs for the ZZZ token with an explanation why things like PDT don't work.

westhomas commented 4 years ago

Thanks for all the details! I just removed it from my string as workaround. I agree it would be nice to have the docs updated.