arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.71k stars 673 forks source link

Implement PEP 495 to handle ambiguous datetimes #802

Closed systemcatch closed 4 years ago

systemcatch commented 4 years ago

Pull Request Checklist

Thank you for taking the time to improve Arrow! Before submitting your pull request, please check all appropriate boxes:

If you have any questions about your code changes or any of the points above, please submit your questions along with the pull request and we will try our best to help!

Description of Changes

PEP 495

closes #315 closes #368 closes #476 closes #690

codecov-commenter commented 4 years ago

Codecov Report

Merging #802 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #802   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            9         9           
  Lines         1776      1786   +10     
  Branches       306       307    +1     
=========================================
+ Hits          1776      1786   +10     
Impacted Files Coverage Δ
arrow/arrow.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 90b33b0...39c9e0d. Read the comment docs.

systemcatch commented 4 years ago

Couple of points to note.

systemcatch commented 4 years ago

@pganssle does this look ok to you?

pganssle commented 4 years ago
  • datetime will let you create non ambiguous instances where fold=1 despite being invalid, I've chosen to prevent this in arrow because I can't really see any benefit to allowing this.

PEP 495 explicitly mentions that this is allowed. The biggest benefit it has is that it allows you to specify the strategy for resolving ambiguous datetimes without knowing if the datetime is ambiguous or not. Generally speaking all time zone handling is "lazy", but this would force you to resolve the time zone offset twice any time fold=1 is specified.

Consider that if you were a pytz user, localize(dt, is_dst=True) doesn't require you to know that the datetime is ambiguous, it's just saying, "If you find something that could be DST or STD, you should choose the DST side. fold is similar, it just encodes a different piece of information. See, for example, this section of the pytz-deprecation-shim migration guide.

pganssle commented 4 years ago

One other thing to note, python-dateutil actually doesn't handle fold correctly during gaps. That will get fixed as part of a planned backport of PEP 615 (the zoneinfo/backports.zoneinfo module), but it may be worth trying to paper over that in arrow?

systemcatch commented 4 years ago

Hey Paul thanks for your comments, I'll work on implementing the changes you've suggested.

PEP 495 explicitly mentions that this is allowed. The biggest benefit it has is that it allows you to specify the strategy for resolving ambiguous datetimes without knowing if the datetime is ambiguous or not. Generally speaking all time zone handling is "lazy", but this would force you to resolve the time zone offset twice any time fold=1 is specified.

Consider that if you were a pytz user, localize(dt, is_dst=True) doesn't require you to know that the datetime is ambiguous, it's just saying, "If you find something that could be DST or STD, you should choose the DST side. fold is similar, it just encodes a different piece of information. See, for example, this section of the pytz-deprecation-shim migration guide.

Yeah I saw the PEP allows that to happen but couldn't figure out the benefit, but as you've pointed out this simplifies the overall strategy and saves resolving twice.

One other thing to note, python-dateutil actually doesn't handle fold correctly during gaps. That will get fixed as part of a planned backport of PEP 615 (the zoneinfo/backports.zoneinfo module), but it may be worth trying to paper over that in arrow?

Depends when the backport happens, I see PEP 615 is scheduled for 3.9. We would probably just let dateutil handle this if possible but if you can point me to an example of fold being mishandled in gaps I can see if it's practical to fix now.

pganssle commented 4 years ago

Depends when the backport happens, I see PEP 615 is scheduled for 3.9. We would probably just let dateutil handle this if possible but if you can point me to an example of fold being mishandled in gaps I can see if it's practical to fix now.

Just to make clear, backports.zoneinfo already exists and in Python 3.6+, I (as the maintainer of dateutil and the creator of PEP 615) highly recommend you use it whenever possible. The "backport" I'm referring to is that I'm planning to take the underlying logic from the pure-Python implementation of zoneinfo and move it into dateutil.tz, to the extent possible. That will fix a number of issues, but my major efforts to get PEP 615 ready and working in Python 3.9 made my backlog of work grow enormously ☹.

systemcatch commented 4 years ago

Just to make clear, backports.zoneinfo already exists and in Python 3.6+, I (as the maintainer of dateutil and the creator of PEP 615) highly recommend you use it whenever possible. The "backport" I'm referring to is that I'm planning to take the underlying logic from the pure-Python implementation of zoneinfo and move it into dateutil.tz, to the extent possible. That will fix a number of issues, but my major efforts to get PEP 615 ready and working in Python 3.9 made my backlog of work grow enormously frowning_face.

Ok thanks for clarifying, I think the best thing to do is create a separate issue for PEP 615 and look to implement that once I've fixed the problems with imaginary datetimes.

As regards dateutil, since we use it so heavily I'll take a look at the repo and see if there's anything I can help with, even if it's just doing some housekeeping.

systemcatch commented 4 years ago

@jadchaar here is an example of the stuff we were talking about on WhatsApp.

(arrow27) chris@ThinkPad:~/arrow$ python
Python 2.7.18 (tags/v2.7.18:8d21aa21f2, Jul 16 2020, 23:20:47) 
[GCC 9.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
arrow/arrow.py:28: DeprecationWarning: Arrow will drop support for Python 2.7 and 3.5 in the upcoming v1.0.0 release. Please upgrade to Python 3.6+ to continue receiving updates for Arrow.
  DeprecationWarning,
>>> before = arrow.Arrow(2017, 10, 29, 2, 0, tzinfo="Europe/Stockholm")
>>> after = arrow.Arrow(2017, 10, 29, 2, 0, tzinfo="Europe/Stockholm", fold=1)
>>> before.fold
0
>>> type(before.datetime)
<type 'datetime.datetime'>
>>> after.fold
1
>>> type(after.datetime)
<class 'dateutil.tz._common._DatetimeWithFold'>
systemcatch commented 4 years ago

Hey @jadchaar I pushed a commit that makes fold a keyword-only argument, let me know what you think.

systemcatch commented 4 years ago

:rocket: