eclipse-uprotocol / up-cpp

uProtocol Language Specific Library for C++
Apache License 2.0
15 stars 25 forks source link

Add valgrind in CI #160

Closed BishrutSubedi closed 3 weeks ago

BishrutSubedi commented 1 month ago

This PR adds dynamic analysis check i.e. valgrind to run in all unit test in CI

It also adds a text file to exclude any test to run under valgrind and it's tools

To skip test from running, add it in format <test binary name>.<testsuite name>.<test name>

gregmedd commented 1 month ago

CI failure caused by issue https://github.com/eclipse-uprotocol/up-conan-recipes/issues/9

gregmedd commented 4 weeks ago

@BishrutSubedi - It looks like the change in 38cb0ba worked. The tests that failed are sensitive to memory allocation throughput, which is reduced when valgrind is running. I think it might be safe to disable those tests when running valgrind. What do you think?

BishrutSubedi commented 3 weeks ago

@BishrutSubedi - It looks like the change in 38cb0ba worked. The tests that failed are sensitive to memory allocation throughput, which is reduced when valgrind is running. I think it might be safe to disable those tests when running valgrind. What do you think?

Sounds good to me

gregmedd commented 3 weeks ago

I've been poking at this a little more and I'm not sure how to interpret results out of it. For example, I have a run where I added an intentional memory leak. The valgrind output from the log looks like this:

==2705== Memcheck, a memory error detector
==2705== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2705== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2705== Command: ctest -j -E UuidBuilderTest.WithIndependentState|UuidBuilderTest.CounterIncrementWithinTimestampTick|TestUuidBuilder.CounterIncrements
==2705== Parent PID: 2699
==2705==
==2705==
==2705== HEAP SUMMARY:
==2705==     in use at exit: 72,704 bytes in 1 blocks
==2705==   total heap usage: 182,638 allocs, 182,637 frees, 90,297,980 bytes allocated
==2705==
==2705== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==2705==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==2705==    by 0x431875: ??? (in /usr/local/bin/ctest)
==2705==    by 0x11C93CC: ??? (in /usr/local/bin/ctest)
==2705==    by 0x4988E1B: __libc_start_main@@GLIBC_2.34 (libc-start.c:375)
==2705==    by 0x431BA2: ??? (in /usr/local/bin/ctest)
==2705==    by 0x1FFEFFF6C7: ???
==2705==    by 0x1B: ???
==2705==    by 0x3: ???
==2705==    by 0x1FFEFFFC4E: ???
==2705==    by 0x1FFEFFFC54: ???
==2705==    by 0x1FFEFFFC57: ???
==2705==    by 0x1FFEFFFC5A: ???
==2705==
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: reachable
   fun:malloc
   obj:/usr/local/bin/ctest
   obj:/usr/local/bin/ctest
   fun:__libc_start_main@@GLIBC_2.34
   obj:/usr/local/bin/ctest
   obj:*
   obj:*
   obj:*
   obj:*
   obj:*
   obj:*
   obj:*
}
==2705== LEAK SUMMARY:
==2705==    definitely lost: 0 bytes in 0 blocks
==2705==    indirectly lost: 0 bytes in 0 blocks
==2705==      possibly lost: 0 bytes in 0 blocks
==2705==    still reachable: 72,704 bytes in 1 blocks
==2705==         suppressed: 0 bytes in 0 blocks
==2705==
==2705== For lists of detected and suppressed errors, rerun with: -s
==2705== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

When I run one of the test binaries under valgrind directly, I get this instead:

==9291== Memcheck, a memory error detector
==9291== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==9291== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==9291== Command: ./CallbackConnectionTest
==9291== 
==9291== 
==9291== HEAP SUMMARY:
==9291==     in use at exit: 108 bytes in 27 blocks
==9291==   total heap usage: 498 allocs, 471 frees, 150,404 bytes allocated
==9291== 
==9291== 4 bytes in 1 blocks are definitely lost in loss record 1 of 27
==9291==    at 0x4846FA3: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9291==    by 0x126FF8: uprotocol::utils::callbacks::CalleeHandle<void>::CalleeHandle(std::shared_ptr<uprotocol::utils::callbacks::Connection<void> >, std::shared_ptr<std::function<void ()> >, std::optional<std::function<void (uprotocol::utils::callbacks::CallerHandle<void>)> >&&, uprotocol::utils::callbacks::Connection<void>::PrivateConstructToken) (CallbackConnection.h:262)
==9291==    by 0x12DCCE: uprotocol::utils::callbacks::Connection<void>::establish(std::function<void ()>&&, std::optional<std::function<void (uprotocol::utils::callbacks::CallerHandle<void>)> >&&) (CallbackConnection.h:109)
==9291==    by 0x11AF10: (anonymous namespace)::CallbackTest_EstablishDoesNotThrow_Test::TestBody() (CallbackConnectionTest.cpp:46)
==9291==    by 0x178674: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2612)
==9291==    by 0x16FDFC: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2648)
==9291==    by 0x146E8B: testing::Test::Run() (gtest.cc:2687)
==9291==    by 0x147A5C: testing::TestInfo::Run() (gtest.cc:2836)
==9291==    by 0x1484D5: testing::TestSuite::Run() (gtest.cc:3015)
==9291==    by 0x159659: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5920)
==9291==    by 0x179993: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2612)
==9291==    by 0x171358: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2648)
==9291== 
...
==9291== 
==9291== LEAK SUMMARY:
==9291==    definitely lost: 108 bytes in 27 blocks
==9291==    indirectly lost: 0 bytes in 0 blocks
==9291==      possibly lost: 0 bytes in 0 blocks
==9291==    still reachable: 0 bytes in 0 blocks
==9291==         suppressed: 0 bytes in 0 blocks
==9291== 
==9291== For lists of detected and suppressed errors, rerun with: -s
==9291== ERROR SUMMARY: 27 errors from 27 contexts (suppressed: 0 from 0)

In both cases, I modified the process to use the Debug build type just to see if adding debug symbols helped.

BishrutSubedi commented 3 weeks ago

I've been poking at this a little more

With the current setup, the valgrind logs will have the names of test that failed. For more detailed breakdown of lines that failed, I run the failed test locally directly under valgrind.

If build with build type debug, we get the exact lines that caused failure. In release, we get only the top level function call of where it failed, but the subsequent lines are hidden.

We can think this as summary of what failed, and investigate locally to see the result.

One way to get this detail result in CI would be to collect tests, under test dir, and run it directly with valgrind and concatenate the results

gregmedd commented 3 weeks ago

This looks good for me. Since there are some valgrind findings already, I have added #196 as a follow-up to log those issues and #197 to block PRs on valgrind failures once all findings are addressed.