Open tweyter opened 9 years ago
As an aside, here's a great write-up about errors when dealing with time. http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time Not everything affects python datetime, but it's informative, nonetheless.
Forgot to say at the time: Thanks for bringing this up. As per discussion elsewhere, it'll be a little while before I get to this, but I'm definitely going to look into things along these lines.
You're welcome. I'll try to contribute some code when I get the chance, too. I was just looking at the leap-second issue this morning, and it looks fairly easy to implement. A github version of the time zone database and code can be found here: https://github.com/eggert/tz It's a GNU/Linux script for implementing all of the time zone issues (what Pytz uses) and it's a bit beyond me, but the list of leap seconds is here: https://github.com/eggert/tz/blob/master/leap-seconds.list I think all one would have to do is inherit datetime.DatetimeStrategy and replace "def produce_template(self, context, pv):" Something like this: base = dt.datetime(1900, 1, 1, 0, 0) # Set TAI epoch leap_second_time = base + datetime.timedelta(seconds=3550089600) # For the leap-second on July 1, 2012 Then just randomly add or subtract 1-2 seconds and add random microseconds to supply random data that falls around that leap-second.
It should be possible to read this list from pytz
, which I will try to add to #621 get around to eventually. A more interesting approach would be to draw the timezone first, and then prefer datetimes around the DST transition too...
@pganssle (dateutil maintainer) and I met at Pylondinium and designed a general way to generate nasty datetimes, which works with any source of timezones (tzinfo
objects) and I've since confirmed with @DRMacIver that it should work.
Given two naive datetimes as bounds - defaulting to the representable limit if not passed explicitly - and a strategy for Optional[tzinfo]
, we need to draw a datetime:
The current algorithm:
Proposed, "nasty datetimes", algorithm:
D.astimezone(T)
(henceforth D0), with the usual special handling for pytz.Note that this should work with any source of timezones, any bounds the user might choose, and have small overhead in any case where there are no nasty datetimes to find (and 'reasonable' overhead when it does).
If you want to work on this issue, you're entirely welcome - but please contact me first so we avoid can overlapping work or comms and you can ask questions as you go.
From some minor thinking about it, these are the classes of "nasty" time zones that I think we should try to find for each time zone:
tzname
without a corresponding change in DSTtzname
utcoffset
Europe/Ireland
, Namibia during certain periods)When writing tests for this kind of thing, I tend to hand-select zones that match these edge cases preferably in both the Northern and Southern hemisphere, though I've never found any where this has bitten me.
Based on a proof-of-concept Paul wrote at PyCon: 27df35803bf3e8adeea39fee16afe588ee377784. Now we "just" need to implement the search functionality I described last year!
More broadly, I'm thinking that there are actually two distinct things we might want to generate here:
Datetimes that are "weird" in isolation; i.e. exhibit an unusual property listed on the Weirdness
enum. We can search toward ambiguous or imaginary times based on dst/utcoffset changes between two endpoints, but for most there's nothing to do beyond recognise it when we see one - which we can use to bail out of the above search early, so it's not totally useless.
Transitions between datetimes. I'm not sure how to to take advantage of this when generating single datetimes, but it would be nice if there was a way to do so... I think this is properly a follow-up issue once we get the first stage working.
More notes: anyone thinking about leap seconds should go look at @aarchiba's work in nanograv/PINT#549, which includes some lovely new strategies as well as thoughtful consideration of "leap smear".
Unfortunately we can't represent leap seconds with Python's datetime.datetime()
type, and this won't be fixed. It would still be useful to make adjacent times more likely though, as the number of TAI seconds elapsed across the interval won't match the number of Unix seconds - but Python effectively does leap-smear via NTP updates, so it's unlikely to be detectable from Python.
what about using zoneinfo?
It's got better handling of the .fold
attribute (fixing ambiguous datetimes), but all the other cases are down to the underlying very complicated semantics of timezones themselves rather than which library we use to represent them. See above for the proposed algorigthm and targets.
After spending some more time looking into leap seconds, I now think that handling them at all is so rare in Python that biasing towards them is unlikely to be a net improvement in bug-finding power outside of literally astronomical timekeeping code.
If anyone working on that is interested in picking it up, here's some code to get the UTC datetimes of each leap second; you could then sample one and add a random diff from (0, +/- <+1s, +/- 12h). I'd sketched plans to integrate that into st.datetimes()
, including transparent shrinking etc., but as above decided against.
diff --git a/hypothesis-python/setup.py b/hypothesis-python/setup.py
index 14c24b9ef..8c96d9dd8 100644
--- a/hypothesis-python/setup.py
+++ b/hypothesis-python/setup.py
@@ -82,7 +82,13 @@ setuptools.setup(
author_email="david@drmaciver.com",
packages=setuptools.find_packages(SOURCE),
package_dir={"": SOURCE},
- package_data={"hypothesis": ["py.typed", "vendor/tlds-alpha-by-domain.txt"]},
+ package_data={"hypothesis": ["py.typed", "vendor/leap-seconds.list", "vendor/tlds-alpha-by-domain.txt"]},
url="https://hypothesis.works",
project_urls={
"Source": "https://github.com/HypothesisWorks/hypothesis/tree/master/hypothesis-python",
diff --git a/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py b/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py
index 581b3ac3c..5710fabdc 100644
--- a/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py
+++ b/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py
@@ -34,6 +34,18 @@ def is_pytz_timezone(tz):
return module == "pytz" or module.startswith("pytz.")
+@lru_cache(maxsize=1)
+def get_leap_seconds() -> tuple[dt.datetime, ...]:
+ """Return a list of UTC datetimes corresponding to each leap second."""
+ traversable = resources.files("hypothesis.vendor") / "tlds-alpha-by-domain.txt"
+ epoch = dt.datetime(1900, 1, 1, tzinfo=dt.timezone.utc)
+ return tuple(
+ epoch + dt.timedelta(seconds=int(line.split()[0]))
+ for line in traversable.read_text(encoding="utf-8").splitlines()
+ if not line.startswith("#")
+ )
+
+
def replace_tzinfo(value, timezone):
if is_pytz_timezone(timezone):
# Pytz timezones are a little complicated, and using the .replace method
diff --git a/tooling/src/hypothesistooling/__main__.py b/tooling/src/hypothesistooling/__main__.py
index 6eb938510..d07914c85 100644
--- a/tooling/src/hypothesistooling/__main__.py
+++ b/tooling/src/hypothesistooling/__main__.py
@@ -363,6 +363,10 @@ def update_vendored_files():
if fname.read_bytes().splitlines()[1:] != new.splitlines()[1:]:
fname.write_bytes(new)
+ url = "https://hpiers.obspm.fr/iers/bul/bulc/ntp/leap-seconds.list"
+ (vendor / url.split("/")[-1]).write_bytes(requests.get(url).content)
+
# Always require the most recent version of tzdata - we don't need to worry about
# pre-releases because tzdata is a 'latest data' package (unlike pyodide-build).
# Our crosshair extra is research-grade, so we require latest versions there too.
pytz.tzinfo.localize will raise a NonExistentTimeError or AmbiguousTimeError exception if it can't resolve the current local time due to the change to/from daylight savings time. This is the source for numerous bugs in software dealing with datetimes in Python. A strategy that selects for these error causing times would help improve the quality of Hypothesis-Datetime.