HowardHinnant / date

A date and time library based on the C++11/14/17 <chrono> header
Other
3.14k stars 678 forks source link

Daylight savings transitions not aligned with large dates depending on `USE_OS_TZDB` #563

Open DavisVaughan opened 4 years ago

DavisVaughan commented 4 years ago

First off, great library! Secondly, I'm an R developer attempting to expose some of this library to R users through an R package. In doing so, I generally test a wide range of dates, which is where I came across what I think it a bug.

I'm on MacOS 10.14.5 if that matters.

Take this somewhat far in the future date:

"2196-10-06 11:39:41 UTC"

Say I want to force that to "America/New_York" with the same clock time. Ignoring any DST boundary problems, I can do something like:

void test_wrong() {
  const seconds duration(7156035581);
  const date::sys_seconds sys_second_point(duration);

  auto x_zt = date::make_zoned("UTC", sys_second_point);

  const date::time_zone* tz = date::locate_zone("America/New_York");

  auto x_new_zt = date::make_zoned(tz, x_zt.get_local_time());

  std::cout << x_new_zt << std::endl;

  std::cout << x_new_zt.get_sys_time().time_since_epoch().count() << std::endl;
}

With -DUSE_OS_TZDB=0 this prints:

2196-10-06 11:39:41 EDT
7156049981

With -DUSE_OS_TZDB=1 this prints:

2196-10-06 11:39:41 EST
7156053581

I believe the correct answer is the first one, 2196-10-06 11:39:41 EDT. This aligns with another R package, lubridate, which internally uses CCTZ.

I'm not an expert at C++, but I did a little digging and it mainly looks like in load_sys_info() the r.save member isn't getting the right value.

sys_info
time_zone::load_sys_info(std::vector<detail::transition>::const_iterator i) const
{
    using namespace std::chrono;
    assert(!transitions_.empty());
    assert(i != transitions_.begin());
    sys_info r;
    r.begin = i[-1].timepoint;
    r.end = i != transitions_.end() ? i->timepoint :
                                      sys_seconds(sys_days(year::max()/max_day));
    r.offset = i[-1].info->offset;
    r.save = i[-1].info->is_dst ? minutes{1} : minutes{0};
    r.abbrev = i[-1].info->abbrev;
    return r;
}

Since this date is so large, it is outside the range of transitions, so r.end is set to sys_seconds(sys_days(year::max()/max_day));. But no special handling is done for the DST.

Is there a way to handle this consistently? Thanks!

HowardHinnant commented 4 years ago

It turns out that the OS-supplied binary does contain a POSIX-style time zone string at the end: EST5EDT,M3.2.0,M11.1.0 which the parser could identify and use to extend the transition table indefinitely. The lack of this support is a bug in tz.cpp. I'm currently not sure how much work it will take to incorporate this, nor the schedule for getting it done.

HowardHinnant commented 4 years ago

Another bug on macOS (-DUSE_OS_TZDB=1) is that the OS does not supply leap second information, at least as of 10.14.6. I believe they will supply it in the future (if they don't already on 10.15).

The lack of leap second information means that utc_clock (et al.) is unavailable.

DavisVaughan commented 4 years ago

I'm very tempted to ship my own copy of the tzdb with the R package and forcibly set -DUSE_OS_TZDB=0 to avoid all of this. That was my original plan until I noticed how fast things run with the os tzdb.

R packages try to be cross platform, so standardizing the package on a "blessed" (by me, not you) version of the tzdb might be a better route for me (I'd have to bundle my own tzdb for Windows anyways).

Any thoughts on that idea generally? When I push updates for the package I would update the tzdb (and probably update the date.h and tz.h files)

Also: I learned a ton from this SO answer of yours. Thanks for that!

HowardHinnant commented 4 years ago

I don't know enough about R packages to advise you much on this issue. Full optimizations on will help with the database lookup stuff. Also there are lower-level techniques within this lib for getting things faster still. This is usually a convenience/performance tradeoff.

I'm not sure if this impacts your decision, but this library has been voted into C++20, and so hopefully within about a year, the C++ vendors will be supplying this library. Though I am not aware of their schedules on this matter.

DavisVaughan commented 3 years ago

This seems useful as a guide

https://github.com/golang/go/commit/1f040e0a6184ef813b8aaf7ce8e409a663939f75?branch=1f040e0a6184ef813b8aaf7ce8e409a663939f75&diff=unified

jorisvandenbossche commented 1 year ago

It turns out that the OS-supplied binary does contain a POSIX-style time zone string at the end: EST5EDT,M3.2.0,M11.1.0 which the parser could identify and use to extend the transition table indefinitely. The lack of this support is a bug in tz.cpp. I'm currently not sure how much work it will take to incorporate this, nor the schedule for getting it done.

@HowardHinnant do you know if the usage of this rule to extend the DST transitions into the future is incorporated in the version in C++20+ stdlib? Or is that also missing there?

HowardHinnant commented 1 year ago

C++20 should get the right answer. Apple has yet to ship this part of C++20 in its macOS development tools.