HowardHinnant / date

A date and time library based on the C++11/14/17 <chrono> header
Other
3.13k stars 676 forks source link

Offset between UTC and TAI incorrect when in Release build mode? #619

Open connorjak opened 3 years ago

connorjak commented 3 years ago
TEST_F(PreciseClockTests, TAIvsUTC)
{
    //NOTE: these can change as leap seconds are added
    //https://en.wikipedia.org/wiki/Epoch_(astronomy)

    static auto J2000_date = year{ 2000 } / January / 1;
    static auto J2000_tai_hmsms = 11h + 59min + 27s + 816ms;
    static auto J2000_utc_hmsms = 11h + 58min + 55s + 816ms;

    //https://en.wikipedia.org/wiki/International_Atomic_Time
    //Seidelmann, P. K., Ed. (1992). Explanatory Supplement to the Astronomical Almanac. Sausalito, CA: University Science Books. Glossary, s.v. Terrestrial Dynamical Time.
    static date::tai_time<std::chrono::microseconds> J2000_tai = tai_clock::from_local(local_days{ J2000_date } + J2000_tai_hmsms);

    //https://en.wikipedia.org/wiki/Coordinated_Universal_Time
    static date::utc_time<std::chrono::microseconds> J2000_utc = utc_clock::from_local(local_days{ J2000_date } + J2000_utc_hmsms);

    cout << "J2000 is " << J2000_utc << " UTC" << endl;
    cout << "J2000 is " << J2000_tai << " TAI" << endl;

    auto diff = J2000_utc - to_utc_time(J2000_tai);

    auto diff2 = J2000_tai - to_tai_time(J2000_utc);

    cout << "This diff: " << diff << " should be zero, as both quantities were in UTC." << endl;

    EXPECT_EQ(diff, 0us);

    cout << "This diff: " << diff2 << " should be zero, as both quantities were in TAI." << endl;

    EXPECT_EQ(diff2, 0us);
}

This above GTest test works as expected in x64-Debug build configuration, but it has the following output in x64-Release or x64-ReleaseWithDebug configuration:

J2000 is 2000-01-01 12:05:35.816000 UTC
    J2000 is 2000-01-01 11:59:27.816000 TAI
    This diff: 863000000µs should be zero, as both quantities were in UTC.
    ..\..\..\server\tests\time\PreciseClockTests.cpp(65): error: Expected equality of these values:
      diff
        Which is: 863000000µs
      0us
        Which is: 0µs
    This diff: -863000000µs should be zero, as both quantities were in TAI.
    ..\..\..\server\tests\time\PreciseClockTests.cpp(69): error: Expected equality of these values:
      diff2
        Which is: -863000000µs
      0us
        Which is: 0µs

Windows 10 64 bit, MSVC 1427, date and tz from vcpkg with version 3.0.0#1. Visual Studio Community 2019, version 16.7.6. CMake build. C++17.

HowardHinnant commented 3 years ago

That is a truly terrifying result, and I'm unable to duplicate it on the only machine I have, macOS.

It seems clear to me that it has gone wrong by the time J2000_utc is computed. The symptoms look like Undefined Behavior in this computation, but I've scanned the code and am so far missing any possible UB in this computation.

If it is not UB, then that leaves compiler bug, which is also hard to believe.

connorjak commented 3 years ago

Got it; I'll see if I can replicate this in a WSL build.

connorjak commented 3 years ago

The program segfaulted on startup in the same line as J2000_utc in WSL-GCC-Debug. Well, that's a lot better than failing silently...

Here's the GDB stack trace: image

PreciseClock.h:113: static sc_utc_time J2000_utc = utc_clock::from_local(local_days{ J2000_date } + J2000_utc_hmsms);

connorjak commented 3 years ago

The white text looks somewhat similar to what happens when there is a stack overflow.

HowardHinnant commented 3 years ago

Very useful stack dump, thanks!

This looks to me like at https://github.com/HowardHinnant/date/blob/master/include/date/tz.h#L1896, leaps, which has type vector<leap_second>, is corrupted. Most likely it has not yet been constructed, or has already been destructed.

It should be constructed on first access to get_tzdb(), and it should not be destructed until after main returns.

It might be interesting to print out leaps.size() here. It probably contains some garbage value (it should be 27).

If indeed get_tzdb().leap_seconds is trashed, there are a few possibilities as to how it got this way:

  1. A wild pointer write in another part of the program, which just happened to write here.
  2. This is a multithreaded program with multiple threads calling get_tzdb() simultaneously, and this is an old version of Windows/VS that doesn't support thread safe initialization of function local statics (Microsoft calls these "magic statics").
  3. This is executing after main has returned, and happens to be executing after the data structure referred to by get_tzdb() has already destructed.

All three of these are wild guesses on my part. And most likely all of them are wrong. But we've got to start somewhere. :-)

connorjak commented 3 years ago

All of that sounds pretty plausible. My program is multithreaded but its tests aren't. I'll look into this.

connorjak commented 3 years ago

From my initial probing, leaps.size() = 12 right before the segfault on WSL Debug.

HowardHinnant commented 3 years ago

Are you setting USE_OS_TZDB to 1? I was assuming not since you appear to be on Windows, but I thought I should ask.

connorjak commented 3 years ago

I have USE_OS_TZDB=0 DATE_USE_DLL=1 on my Windows / MSVC build. Not sure how to check that for my WSL / GCC build though. I'll try setting that to 1.

HowardHinnant commented 3 years ago

Keeping it as 0 will make things simpler. (it defaults to 0)

HowardHinnant commented 3 years ago

With USE_OS_TZDB=0, this is where the leap second info is read: https://github.com/HowardHinnant/date/blob/master/src/tz.cpp#L3521-L3525

Once that loop exits, db->leap_seconds.size() should be 27. If it is, then it would be good bisect things to find out when it gets set to 12 (or any other number).

connorjak commented 3 years ago

Here's another gdb backtrace with -DUSE_OS_TZDB=1:

image At least I'm pretty sure that's with that option entered. I specified it on the cmake .. command I use for makefile generation.

HowardHinnant commented 3 years ago

Those two stack traces look the same to me, modulo addresses.

Are you linking against pthreads? (-lpthread). Some linux platforms have an annoying habit of stubbing in calls to pthread functions with do-nothing functions if you don't explicitly ask for pthreads. This can result in race conditions. I don't know why they don't give you link errors instead.

connorjak commented 3 years ago

I do

if (MSVC)

else()
    target_link_libraries(run_Server PRIVATE -lpthread)
endif()

run_Server is my main executable though; I don't directly -lpthread on my testing executable. Here's a list of libraries linked into the testing executable:

target_link_libraries(testingDependencies PRIVATE asio asio::asio)
target_link_libraries(testingDependencies PRIVATE OpenSSL::SSL OpenSSL::Crypto)
target_link_libraries(testingDependencies PRIVATE Taskflow::Taskflow)
target_link_libraries(testingDependencies PUBLIC cpr)
target_include_directories(testingDependencies PUBLIC ${RAPIDJSON_INCLUDE_DIRS})
target_link_libraries(testingDependencies PRIVATE date::date date::date-tz)
target_link_libraries(testingDependencies PUBLIC GTest::gtest GTest::gtest_main GTest::gmock GTest::gmock_main)

All of these are sourced from vcpkg, and precompiled using their vcpkg-export feature (headers, static libs and DLLs-only from there) for both x64-windows and x64-linux.

add_executable(ServerTests
    (All of my test .cpps)
)

target_link_libraries(ServerTests PUBLIC
    testingDependencies
)
connorjak commented 3 years ago

I actually use std::thread instead of pthread now; I can remove that link.

HowardHinnant commented 3 years ago

On linux platforms std::thread is implemented with pthread.

connorjak commented 3 years ago

Attempting to breakpoint in Windows now to examine the behavior around the first utc_clock::from_local().

connorjak commented 3 years ago

I put my test code in my first lines of main.cpp (which still includes PreciseClock.h) to reduce the clutter in the stack from GTest. That by itself didn't change the behavior; still wrong in Windows, Release.

I went back to Windows Debug and also switched over to the non-exported vcpkg libraries (.cpps intact), and I got this output the first time my code went over get_tzdb().

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

Scanning the drive for archives:
1 file, 401479 bytes (393 KiB)

Extracting archive: C:\Users\conno\Downloads\tzdata2020d.tar.gz
--
Path = C:\Users\conno\Downloads\tzdata2020d.tar.gz
Type = gzip
Headers Size = 10

Everything is Ok

Size:       1208320
Compressed: 401479

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

Scanning the drive for archives:
1 file, 1208320 bytes (1180 KiB)

Extracting archive: C:\Users\conno\Downloads\tzdata\tzdata2020d.tar
--
Path = C:\Users\conno\Downloads\tzdata\tzdata2020d.tar
Type = tar
Physical Size = 1208320
Headers Size = 17920
Code Page = UTF-8

Everything is Ok

Files: 30
Size:       1184247
Compressed: 1208320
        1 file(s) moved.

My code then proceeds to work as expected. Trying with Windows non-vcpkg-export ReleaseWithDebug now.

connorjak commented 3 years ago

The above result downloaded a tzdata folder into my downloads directory; not sure if that ever happened before now when I was using vcpkg-export libraries.

Ok, Windows non-vcpkg-export ReleaseWithDebug works as well. I'm guessing there's a problem with how vcpkg exports the tz library which is causing this to break.

connorjak commented 3 years ago

vcpkg-export docs: https://github.com/microsoft/vcpkg/blob/master/docs/specifications/export-command.md

HowardHinnant commented 3 years ago

I'm wondering if vcpkg builds with one set of tz flags and your main() gets built with another set. This would cause tz.h to appear as two different header files: one for tz.cpp and one for main(). The result would be unpredictable.

connorjak commented 3 years ago

Yeah, it's probably something along those lines. The easy thing for me to do to resolve this is just compile date and tz source files in my code instead of using vcpkg for them.

connorjak commented 3 years ago

I am using vcpkg-export for this because I am working with a large team on this large project; I just commit a zip of the exported packages (which includes a CMake Toolchain File) to the repo so no one else needs to install and manipulate vcpkg. This also makes sure everybody is using the exact same versions of libraries. This is the first issue I've had with it in terms of improper library behavior.

HowardHinnant commented 3 years ago

I regret that the CMake files of the tz library aren't of my design and implementation. I know nothing of CMake. It is supported here by popular demand and implemented by volunteers doing their best. In hindsight I should not have agreed to have it hosted on my site since I lack the expertise to maintain it.

connorjak commented 3 years ago

I feel that; the build system can be the hardest portion of a C++ project to maintain. 80% or so of the questions I get from team members are about build errors.

If I find the time, I'll take a look at the CMake portion of this problem; it might be a simple fix. It also may be worth raising an Issue on the vcpkg repo if this is indeed only a problem with how vcpkg-export affects the library.

Thanks for your help! I'll update this thread if I find out more about this problem.

HowardHinnant commented 3 years ago

Thank you.

connorjak commented 3 years ago

For other users:

Procedure for converting date / tz library from vcpkg to normal CMake library inclusion:

  1. Clone this repo or add it as a submodule to a known directory in your application files
  2. In date/CMakeLists.txt, edit the options.
    • BUILD_TZ_LIB: ON
    • Other settings as-required
  3. In your CMakeLists.txt, replace find_package(date CONFIG REQUIRED) with add_subdirectory("${YOUR_RELATIVE_DIRECTORY}/date")
  4. For some reason, curl was hanging while trying to download the tzdb in my application.
    • To fix that, I set up tz to use a previously-downloaded local copy of tzdata
    • date/CMakeLists.txt: Set MANUAL_TZ_DB to ON
    • Before #include "date/tz.h", #define INSTALL "YOUR_RELATIVE_PATH/date"
    • This will now require some manual upkeep to make sure your timezones are up to date with new government decisions.

EDIT: it appears I was still referencing the Downloads-folder tzdata I had when it worked. More discussion in #618.

ecorm commented 3 years ago

I don't know if you've resolved your issue or not, but make sure that USE_OS_TZDB is the same when compiling the library separately and when compiling the application. USE_OS_TZDB affects the memory layout of time_zone.