dateutil / dateutil

Useful extensions to the standard Python datetime features
Other
2.35k stars 489 forks source link

Parser UTC offset logic is inverted #70

Open atttx123 opened 9 years ago

atttx123 commented 9 years ago

my python-dateutil version: 2.4.1

i am in the time zone: CST(UTC+8)

when i use dateutil, it return different with other tools

pyhton-dateutil:

In [21]: parse("2015-03-31 00:01 UTC").astimezone(cst)
Out[21]: datetime.datetime(2015, 3, 31, 8, 1, tzinfo=tzlocal())

In [22]: parse("2015-03-31 00:01 UTC+1").astimezone(cst)
Out[22]: datetime.datetime(2015, 3, 31, 9, 1, tzinfo=tzlocal())

In [23]: parse("2015-03-31 00:01 UTC-1").astimezone(cst)
Out[23]: datetime.datetime(2015, 3, 31, 7, 1, tzinfo=tzlocal())

linux date command:

yu at debian in ~ 
(! 4076)-> date -d "2015-03-31 00:01 UTC"
Tue Mar 31 08:01:00 CST 2015
yu at debian in ~ 
(! 4076)-> date -d "2015-03-31 00:01 UTC+1"
Tue Mar 31 07:01:00 CST 2015
yu at debian in ~ 
(! 4077)-> date -d "2015-03-31 00:01 UTC-1"
Tue Mar 31 09:01:00 CST 2015
atttx123 commented 9 years ago

i think it is the parse return a wrong tzoffset

In [21]: parse("2015-03-31 00:01 UTC-1")
Out[21]: datetime.datetime(2015, 3, 31, 0, 1, tzinfo=tzoffset(None, 3600))

In [22]: parse("2015-03-31 00:01 UTC+1")
Out[22]: datetime.datetime(2015, 3, 31, 0, 1, tzinfo=tzoffset(None, -3600))
pganssle commented 9 years ago

@atttx123 Given that tzlocal() is CST, are you sure that that is the wrong answer? I looked into something similar for issue #64, and came to the conclusion that tzlocal() and a local time zone are equivalent.

Additionally, can you clarify where cst is coming from? It seems likely that you are generating it by calling:

>>> from dateutil.tz import gettz
>>> cst = gettz('CST')
>>> print(cst)
tzlocal()

If you're in the chinese standard time zone (on linux), gettz('CST') will return tzlocal(), because 'CST' is one of the local time zones. If you want to specifically retrieve a time zone from the tzfile, not your local time zone, you can specify the locale string:

>>> from dateutil.tz import gettz
>>> cst = gettz('Asia/Shanghai')
>>> print(cst)
tzfile('/usr/share/zoneinfo/Asia/Shanghai')

Honestly, though, it seems to me like you're getting the right values out, so I'm not clear that there's a problem with using tzlocal() in this case.

pganssle commented 9 years ago

@atttx123 I'm going to assume that you are satisfied and close this as not a bug. If this is mistaken let me know and I can reopen.

atttx123 commented 9 years ago

@pganssle the point is not the timezone, its the parse function itself

when time is 00:00 UTC+1, my local time should be 07:00 UTC+8 but for dateutil, it return 09:00:

In [6]: parse("2015-03-31 00:00 UTC+1").astimezone(cst)
Out[6]: datetime.datetime(2015, 3, 31, 9, 0, tzinfo=tzfile('/usr/share/zoneinfo/Asia/Shanghai'))
atttx123 commented 9 years ago

on linux ,you can use this cmd to check time:

date -d "2015-03-31 00:01 UTC+1"

i am sure dateutil return a wrong answer

pganssle commented 9 years ago

Ah, sorry, I missed that. My bad. I'll look into it.

pganssle commented 9 years ago

@atttx123 Hm... This will require careful thought on how to address this, unfortunately. It seems that this was a deliberate choice, as seen in this comment. Interestingly, another comment in the tz module specifically mentions that tzstr has the opposite behavior.

If it's a problem, I'm not clear on why it hasn't been noticed before - this has been the behavior since at least dateutil 2.1.

atttx123 commented 9 years ago

i think the parse should behave same with other tools like date command, otherwise it will make people confuse

pganssle commented 9 years ago

@atttx123 I agree in general, but it's important to determine if this is a bug that no one has noticed, a bug in date or if this is the well-known behavior of the library. If we want to switch the logic to the more common version, we may want an intermediate version that raises a deprecation warning.

atttx123 commented 9 years ago

@pganssle if it becomes a feature, this one should highlight in document, comment in code is not enough

atttx123 commented 9 years ago

+1 is the calculate or it is the timezone, for me , UTC+8 is the timezone, i never thought it is the UTC plus 8 before

maybe this one is more persuasiveness: UTC+01:00

pganssle commented 9 years ago

Sorry, that was a half-formed thought, I meant to click "cancel" but I accidentally posted it instead.

pganssle commented 9 years ago

@atttx123 OK, I've done a bit of googling, and it turns out that this is actually a feature of POSIX time-zones that the trailing time has an inverted sense, which has historically led to a lot of confusion.

I will think about how to move forward. If there are datetimestamps out there using UTC offsets in the non-POSIX sense (I assume there are), and there's no way to disambiguate them from context, then maybe a POSIX flag in the parserinfo is the right way to go.

spumer commented 9 years ago

Same issue:

>>> # stanard library
>>> dt = datetime.datetime.strptime('2015-06-15T11:19:57 +0500', '%Y-%m-%dT%H:%M:%S %z')
>>> str(datetime.timezone(dt.utcoffset()))
'UTC+05:00'
>>>
>>> # dateutil library
>>> dt2 = dateutil.parser.parse('2015-06-15T11:19:57 UTC+05:00')
>>> str(datetime.timezone(dt2.utcoffset()))
'UTC-05:00'

Yeah, parse('2015-06-15T11:19:57 +0500') will work correctly, but problem with 'UTC' prefix.

zed commented 9 years ago

Perhaps, a boolean parameter posix_style that will default to False eventually could be introduced. Related: Timezone offset sign reversed by Python dateutil?.

yohplala commented 4 years ago

Hello, Is there any update on this issue? I am sorry, I encountered this trouble as well. I added an example in https://github.com/pandas-dev/pandas/issues/30518

When resolving the offset with a timezone that has DST in a panda dataframee, and using these timestamps as indexes, this reversed behavior causes trouble has it creates duplicate indexes. Please, is there any workaround? I thank you in advance for your help. Bests, Pierre

pganssle commented 4 years ago

@yohplala No, there is no update at the moment. The parser is a bit of a hard nut to crack. I haven't thought about it in a while but it's really down to coming up with a better way to configure the parser than we have now without adding a bunch of confusing and potentially conflicting flags.

If you have a mix of datetime strings, some of which appear to mimic (but are not) POSIX-style offsets, and some of which are using a different style, your best bet may be to modify those strings to use an unambiguous convention before parsing.

zenoprod commented 3 years ago

i encountered this problem too. when i coding pd.to_datetime('Jul 1, 2021 12:00:01 AM UTC-7') it get

Timestamp('2021-07-01 00:00:01+0700', tz='pytz.FixedOffset(420)')

i delete the UTC-7 and use .tz_localize('America/Los_Angeles') to solve this problem.

hope it will solve soon.

Bests, zeno

personalcomputer commented 2 years ago

With regard to workarounds, let me share the preprocessing solution that I am using successfully as a generic workaround:

import re

preprocessed_datetime_str = re.sub(r'(?:GMT|UTC)([+\-]\d+)', r'\1', datetime_str)
dt = dateutil.parser.parse(preprocessed_datetime_str)

This converts offsets like "UTC+0600" or "GMT+0100" into just "+0600" or "+0100", respectively, which causes dateutil to then parse the offset as expected.

dansebcar commented 1 year ago

The current documentation on the subject is misleading. The first reference says UTC offsets east of the zero meridian are positive, and west of the zero meridian are negative.

Is a PR adding a note to the documentation about this exception welcome?