arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.71k stars 673 forks source link

DST handling is not inline with pytz dst handling and most probably incorrect #875

Closed bernhardboehmer closed 3 years ago

bernhardboehmer commented 3 years ago

Issue Description

The handling of the dst switch does not seem to be right. While changing from dst to non-dst in October, the 2am hour is repeated in local time. However, in terms of time zones the repeated hour should not be utc+2 but utc+1 as pytz is doing as well. Otherwise, e.g., checks for time gaps in timeseries will work incorrectly since we have twice 02:20+2 for instance. Please look at this excerpt from the python console (sorry didn't get formatting too well):

from pytz import timezone
berlin = timezone('Europe/Berlin')
arrow.get(datetime.datetime(2017, 10, 29, 1, 0, 0, tzinfo=utc)).to('Europe/Berlin')
       ->   _<Arrow [2017-10-29T02:00:00+02:00]>_
arrow.get(datetime.datetime(2017, 10, 29, 0, 0, 0, tzinfo=utc)).to('Europe/Berlin')
       ->  _<Arrow [2017-10-29T02:00:00+02:00]>_
datetime.datetime(2017, 10, 29, 1, 0, 0, tzinfo=utc).astimezone(berlin)
      -> _datetime.datetime(2017, 10, 29, 2, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CET+1:00:00 STD>)_
datetime.datetime(2017, 10, 29, 0, 0, 0, tzinfo=utc).astimezone(berlin)
      -> _datetime.datetime(2017, 10, 29, 2, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)_

System Info

systemcatch commented 3 years ago

Hi @bernhardboehmer this type of bug was fixed in version 0.17.0.

(arrow) chris@ThinkPad:~/arrow$ python
Python 3.8.3 (default, Jul  7 2020, 18:57:36) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
>>> arrow.__version__
'0.17.0'
>>> import datetime
>>> arrow.get(datetime.datetime(2017, 10, 29, 1, 0, 0), tzinfo="UTC").to('Europe/Berlin')
<Arrow [2017-10-29T02:00:00+01:00]>
>>> arrow.get(datetime.datetime(2017, 10, 29, 0, 0, 0), tzinfo="UTC").to('Europe/Berlin')
<Arrow [2017-10-29T02:00:00+02:00]>

We added support for PEP 495 (fold attribute) to disambiguate ambiguous times along with a new property.

>>> before=arrow.Arrow(2017, 10, 29, 2, tzinfo="Europe/Berlin")
>>> before
<Arrow [2017-10-29T02:00:00+02:00]>
>>> before.ambiguous
True
>>> after=arrow.Arrow(2017, 10, 29, 2, tzinfo="Europe/Berlin", fold=1)
>>> after
<Arrow [2017-10-29T02:00:00+01:00]>
bernhardboehmer commented 3 years ago

Excellent! I missed this new version, sorry for this. In terms of comparision this getting more difficult. I use timestamp for this. Any better alternative?

systemcatch commented 3 years ago

I'm not sure what you mean by comparison, maybe a code example would help?

bernhardboehmer commented 3 years ago

Hi systemcatch, thanks for your reply. Here's a code example. The method receives a set of localized arrow-timestamps and it shall check whether there are gaps within. Offset (period) can be set freely. The nodes are the complete list of arrow-timestamps between start and end. The gaps found do not include "end".

The second line (the set statement) is BTW the location triggering my initial bug report.

The final list comprehension is the location where the check failed with plain arrow objects, that's why I switched to ".timestamp".

Regards Bernhard

    def get_missing_timestamps(self, start, end, timestamps):
        """
        tests if the list of timestamps forms an arithmetic series, of the form 
        t_i = t_i-1 + offset, by comparing with a given arithmetic series, obtained by
        UTCTimePeriod.get_offsets()
        Args:
         start: arrow.Arrow
         end: arrow.Arrow
         offset: integer
         timestamps: a list of timestamps with timezones sic [arrow.Arrow(a, b, c, tzinfo='utc'), ...]
        Returns:
         a list of missing timestamps
        """

        timestamps = set(timestamp.timestamp for timestamp in timestamps)
        nodes = self.get_offsets(start, end)
        return [node for node in nodes if node.timestamp not in timestamps]