Martchus / cpp-utilities

Common C++ classes and routines used by my applications such as argument parser, IO and conversion utilities
GNU General Public License v2.0
52 stars 18 forks source link

`ChronoTests::testTimeSpan` fails #31

Closed MartinRJDagleish closed 9 months ago

MartinRJDagleish commented 9 months ago

Hey,

I am not sure, why nobody else seems to have this problem, but when I try updating my system with paru as my AUR helper, for a few weeks now, the tests for c++-utilities will not run without an error.

The compilation seems to work fine, but the test for TimeSpan always fail.

I manually went into the source code for the tests and commented the whole content of the test function out and only then the tests were successful. All the parts of the function always seemed to throw the same error.

1: Executing test cases ...
1: ...................
1: output for formatting with ANSI escape codes:
1: Error: some error
1: Warning: some warning
1: Info: some info
1: ==> ERROR: Arch-style error
1: ==> WARNING: Arch-style warning
1:     Arch-style message
1: ==> Arch-style success
1:   -> Arch-style sub-message
1: blue, blinking text on red background
1: ---------------------------------------------
1: ....E....................
1:
1:
1: !!!FAILURES!!!
1: Test Results:
1: Run:  43   Failures: 0   Errors: 1
1:
1:
1: 1) test: ChronoTests::testTimeSpan (E)
1: uncaught exception of type std::exception (or derived).
1: - The string "" is no valid floating point number.
1:
1:
1: Tests failed
1/1 Test #1: c++utilities_run_tests ...........***Failed    0.01 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.01 sec

The following tests FAILED:
          1 - c++utilities_run_tests (Failed)
Errors while running CTest
Martchus commented 9 months ago

That's strange. Since only you can reproduce the issue it would make sense to comment out the particular lines in that function to determine what test/check in particular fails. (Unfortunately in C++ with this test framework one doesn't get line numbers logged in case of an exception.)

Otherwise I can only make guesses. Maybe a time-zone related problem (although I though I was able to iron out those)? Maybe an unusual configuration (e.g. using clang++ as compiler instead of g++ or having some weird CFLAGS/CXXFLAGS)?

Martchus commented 9 months ago

And what version of the gcc package are you building against? The latest from Arch Linux should be fine (as that's also what I am using) but if you're on e.g. Manjaro you might use an older version that behaves differently.

MartinRJDagleish commented 9 months ago

Thank you for the fast response.

I am currently using Manjaro (Arch-based). I am compiling with g++ and not with clang++. I live in Germany, so time-zone is CET time zone.

I tried all parts of the testTimeSpan function separately, but I still got the error. Maybe I have to retry or run it differntly. Currently I am using CMake with ninja and then ninja check for running the tests.

gcc / g++ version:

❯ g++ --version
g++ (GCC) 13.2.1 20230801
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Martchus commented 9 months ago

Ok, same timezone and same GCC version. Then we can rule out those theories.

I tried all parts of the testTimeSpan function separately, but I still got the error.

Then this problem affects almost every use of the TimeSpan class. That makes me wonder why you don't see any problems related to time-handling in my applications (or do you)?

MartinRJDagleish commented 9 months ago

First response

I have not quite figured out, how to best reproduce this, but:

  1. When I clone the repo (via the paru AUR CLI tool), then the initial build with the config, throws the error when testing
  2. When I then rebuild it with either cmake -G Ninja -S . -B . -DCMAKE_CXX_COMPILER=clang++, then ninja for building and ninja check. Then the tests pass.
  3. Same worked the other way round, so I had export CXX="clang++" active and then rebuild with cmake -G Ninja -S . -B . -DCMAKE_CXX_COMPILER=g++, ninja, ninja check. The tests pass.

Basically running cmake manual with this config, seems to work, but I can not change the PKGBUILD, right?

Tbh I am not actively using your application and c++-utilities seems to be a dependency for another application.

EDIT

I deleted all the cached files and "freshly" started for more consistent "reproducibility". The intial build + test fails, then (only) when I switch from (the intial) g++ to clang++ with cmake -G Ninja -S . -B . -DCMAKE_CXX_COMPILER=clang++, it works without a problem.

After I ran with clang++ and had it built I change to cmake -G Ninja -S . -B . -DCMAKE_CXX_COMPILER=g++, ninja, ninja check, which then works. The weird part is that, when I run ninja the project gets rebuilt, so I do not understand, how this should be any different more the initial build

Martchus commented 9 months ago

Ok, so a reconfiguration seems to help. That's still quite strange because as you say - the reconfiguration itself shouldn't affect it like this. (And as a side note: I really build this project in many environments beside Arch Linux and in various configurations and have never encountered this problem.)

Maybe there's still something special in the build environment created by paru. If you then reconfigure the project manually, then when doing so you are probably not replicating the initial environment of paru anymore to 100 % so you cannot reproduce the issue anymore. Note that CMake can pick up various settings from the environment (by convention with other build systems, e.g. CFLAGS and CXXFLAGS). If you later run CMake again withing a changed environment (and this may also happen by just calling ninja and not cmake) then this can change the outcome.

I'm wondering what application you're using. I'm not aware of any applications using this library but the ones I develop myself.

MartinRJDagleish commented 9 months ago

UPDATE:

I managed to figure out that the CMAKE_BUILD_TYPE seems to cause the issue here.

 cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX='/usr' -DBUILD_SHARED_LIBS=ON -DCMAKE_CXX_COMPILER=g++ . 

works, but

 cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX='/usr' -DBUILD_SHARED_LIBS=ON -DCMAKE_CXX_COMPILER=g++ . 

does not

Martchus commented 9 months ago

Oh, then the issue is probably due to optimizations (and possibly due to undefined behavior in the code). It would be really nice to pin down the problem because but considering I cannot reproduce it myself that's hard to to for me.

Maybe you can set -DCMAKE_VERBOSE_MAKEFILE=ON and give one example of the compiler invocations you get. This would help to find out whether any special flags end up on the compiler invocations in your case.

Note that the only function that would possibly produce the output The string "" is no valid floating point number. is never be supposed to be called with your version of GCC, at least not as part of the ChronoTests::testTimeSpan() function. So something really goes of the rails here (unless you're building an older version of c++utilities).

MartinRJDagleish commented 9 months ago

Sry "false" alarm? I am really confused now...

The problem looks more like static vs. dynamically compiled library, because if I

❯ ninja
[23/23] Creating library symlink libc++utilities.so.5 libc++utilities.so

here you can see, that a .so file was built, then the ninja check does not work, but even if I use

cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX='/usr' -DBUILD_SHARED_LIBS=ON -DCMAKE_CXX_COMPILER=g++ .

which should build a .so with the -DBUILD_SHARED_LIBS=ON, then only after calling cmake the second time, it would build a .so and not an .a file.

If I look back into the log of previous build tests now, then only whenever a .a file was built, then the tests seemed to pass, like:

❯ ninja
[22/22] Linking CXX static library libc++utilities.a
❯ ninja check
(skipping the inital part)
1: Executing test cases ...
1: ...................
1: output for formatting with ANSI escape codes:
1: Error: some error
1: Warning: some warning
1: Info: some info
1: ==> ERROR: Arch-style error
1: ==> WARNING: Arch-style warning
1:     Arch-style message
1: ==> Arch-style success
1:   -> Arch-style sub-message
1: blue, blinking text on red background
1: ---------------------------------------------
1: ........................
1:
1:
1: OK (43 tests)
1:
1:
1: Tests successful
1/1 Test #1: c++utilities_run_tests ...........   Passed    0.01 sec

100% tests passed, 0 tests failed out of 1
Martchus commented 9 months ago

Sry "false" alarm? I am really confused now...

When this is really due to optimizations then quite the opposite. The code is supposed to work with all optimizations enabled.

However,

The problem looks more like static vs. dynamically compiled library

makes it sound like this problem is because the tests are loading the wrong shared library in your case. That would mean another version of libc++utilities.so.5 is loaded from somewhere on your system by the tests and with that version tests cannot pass. When switching to static linking the problem is of course gone because then the tests don't load the DSO at all. This would make sense considering the behavior has changed recently and only with those recent changes the test CPPUNIT_ASSERT_EQUAL(TimeSpan(), TimeSpan::fromString(string())); would pass.

MartinRJDagleish commented 9 months ago

Ok wow I am feeling pretty stupid now....

Not sure how to make ctest pick the correct version of libc++utilities, but I fixed it now with your help by:

  1. Moving the currently installed libc++utilities (in /usr/lib/) to a BACKUP folder
  2. In the LD_LIBRARY_PATH, no .so file should be found
  3. The correct (newly compiled) libc++utilities would be tested, when updating with paru

This has now finally worked and the library is updated.

Just a few last remarks:

  1. I think I used your syncthingtray package and maybe that is why I still have the library installed.
  2. Sorry for the inconvenience, which was just due to my LD_LIBRARY_PATH then (I guess) and thank you for your time.
  3. Any last tips on how to correctly organize the LD_LIBRARY_PATH such that this issue does not come up again with the next update?
Martchus commented 9 months ago

Sorry for the inconvenience, which was just due to my LD_LIBRARY_PATH then (I guess) and thank you for your time.

No problem :-)

Any last tips on how to correctly organize the LD_LIBRARY_PATH such that this issue does not come up again with the next update?

Never set this variable in a broad scope. Set is only where needed, e.g. when invoking a specific application. Normally it also shouldn't be needed. You could also adjust DT_RUNPATH of a specific application that needs it (e.g. using patchelf).

Note that normally also CMake sets DT_RUNPATH so libraries of the same build are preferred over libraries from the system (or found at all). I guess I rely on that quite heavily when developing so I'm sure it works as intended. Maybe setting LD_LIBRARY_PATH overrides this, though.