facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.46k stars 1.13k forks source link

Daylight savings transitions not applied correctly with large dates #9155

Open zacw7 opened 6 months ago

zacw7 commented 6 months ago

Bug description

Issue

ts = fromTimestampString("2118-06-15 00:00:00");
ts.toTimezone(util::getTimeZoneID("America/Los_Angeles"));  // PDT: UTC-7
EXPECT_EQ(ts, fromTimestampString("2118-06-14 17:00:00"));

Expected behavior: the above test would pass. Actual behavior: the test fails. (Run the test in the draft PR: https://github.com/facebookincubator/velox/pull/9154)

For large dates (date after 2038), daylight transitions are not properly applied and Velox can generate incorrect results for timezone conversion.

Cause

Velox uses OS-supplied timezone data and the date library has lack of support to parse certain rules from the OS-supplied tz data which can be used to calculate the daylight transition table indefinitely. The lack of support is a bug according to the library author.

Potential Solutions

  1. The author claims "C++20 should get the right answer", but it requires upgrading Velox to C++20 then switching to the C++20 equivalent of this library.
  2. Disable USE_OS_TZDB (as well as HAS_REMOTE_API), instead of using the OS supplied timezone database, download the latest version of timezone database (https://www.iana.org/time-zones) which the library is able to parse correctly. Presto Java uses joda-time to do the same thing and it also keeps a full copy the the tz dabase. This requires Velox to monitor if there's any change in the tz database and update the copy if needed.
  3. Fix the issue in the date library itself making sure all POSIX tz formats can be parsed.

cc. @pedroerp @mbasmanova @gggrace14 @Yuhta

System information

N/A

Relevant logs

No response

spershin commented 6 months ago

Nice research! Looks like we could go with the short-term solution 2 and then with a long-term 1 or 2. I like 1 better. :)

pedroerp commented 6 months ago

Thanks for the research on this! This is pretty tricky, and I don't think there is a very easy solution out.

My main concern with solution 2 is that it adds responsibilities that should not be Velox's to the library. As an execution engine, Velox (ideally) should not have to deal with adding a custom timezone database (and being responsible for downloading and updating it). We should just rely on the one provided by the OS, which is what any library user would expect.

Naturally, option 1 will come with time, but in the meantime, although not ideal, I think our best bet could be to just go ahead and fix the underlying library (option 3).

pedroerp commented 3 months ago

FYI: https://github.com/facebookincubator/velox/pull/10097 added logic to throw an exception when unsupported timezone conversions are detected. So now at least they do not silently return wrong results.