efficient / libcuckoo

A high-performance, concurrent hash table
Other
1.61k stars 275 forks source link

valgrind reports Invalid free() / delete / delete[] / realloc() during map destructor #145

Closed thompsonbry closed 2 years ago

thompsonbry commented 2 years ago

I am curious whether you have any insight into this problem.

manugoyal commented 2 years ago

Hmm interesting, @thompsonbry, are you able to provide a cc file and the g++ and valgrind commands you used to get the error?

On my machine, I tried compiling just a file containing your code snippet (replacing rip_t with uint32_t) with G++ version 9.4.0 (g++ -g -O0 --std=c++11 test.cc and then valgrind --tool=memcheck --leak-check=yes ./a.out), but it didn't surface anything.

thompsonbry commented 2 years ago

I can work on a small reproducer. I also want to figure out what version of libcuckoo is being included. Or is that obvious to you from the trace?

Valgrind options:

--36895-- Valgrind options:
--36895--    -v
--36895--    --leak-check=full
--36895--    --show-leak-kinds=definite
--36895--    --gen-suppressions=all
--36895--    --smc-check=all
--36895--    --track-origins=yes

gcc options:

-g -DDEBUG_BUILD=true  -fPIC -D _POSIX_SOURCE -D _GNU_SOURCE -lrt -fno-omit-frame-pointer  -fPIC -fno-omit-frame-pointer -w -Wfatal-errors -g -g -fPIC   -Wall -Wpedantic -Wextra -O0 -g3 -ggdb -std=gnu++1z 
manugoyal commented 2 years ago

thanks!

Yeah the exact version is a bit hard to pin down from the stack trace. The cuckoohash_map destructor is implicitly defined, and since https://github.com/efficient/libcuckoo/commit/4f54e63b42b20a0834bded0feb876a2fd1fdba17, the class has been declared on line 49, so maybe that's a reasonable lower-bound? There have been a few small updates to the library since then, but not sure whether they would impact your test.

thompsonbry commented 2 years ago

We are using the tip of master as of early Jan'22. Seems that the code has not changed since.

thompsonbry commented 2 years ago

I am inclined to think that this was the side-effect of something else going wrong in the application. Closing for now.

thompsonbry commented 2 years ago

I am seeing this again for a trivial reproducer.

#include <gtest/gtest.h>
#include <libcuckoo/cuckoohash_map.hh>

namespace run_with_valgrind {
    class CuckoomapDestroyTest : public ::testing::Test {
    };

    typedef libcuckoo::cuckoohash_map<
        std::uint32_t // class Key,
        , std::uint32_t // class T (Value)
        , std::hash<std::uint32_t> // class Hash
        , std::equal_to<std::uint32_t> // class KeyEqual
        , std::allocator<std::pair<const std::uint32_t, std::uint32_t>> // class Allocator.
        //, std::size_t SLOT_PER_BUCKET = DEFAULT_SLOT_PER_BUCKET
        > map_t;

    // When we run this test with valgrind it reports errors.
    TEST_F(CuckoomapDestroyTest, destroy_01) {
        map_t map_;
    }
}

/*

gcc 7.5
version=5.4.190-118.353.amzn2int.x86_64
AL2_x86_64

Compiled with the following options:

-fPIC -D _POSIX_SOURCE -D _GNU_SOURCE -lrt -w -Wfatal-errors -g -O3 -DNDEBUG

Valgrind run with the following options:

 -v --leak-check=full --show-leak-kinds=definite --gen-suppressions=all --smc-check=all --track-origins=yes

==3700== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info

==3700== Invalid free() / delete / delete[] / realloc()
==3700==    at 0x40320DB: free (vg_replace_malloc.c:530)
==3700==    by 0x46CBCE: deallocate (new_allocator.h:121)
==3700==    by 0x46CBCE: deallocate (alloc_traits.h:462)
==3700==    by 0x46CBCE: _M_deallocate (stl_vector.h:180)
==3700==    by 0x46CBCE: ~_Vector_base (stl_vector.h:162)
==3700==    by 0x46CBCE: ~vector (stl_vector.h:435)
==3700==    by 0x46CBCE: destroy<std::vector<libcuckoo::cuckoohash_map<unsigned int, unsigned int, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::spinlock, std::allocator<libcuckoo::cuckoohash_map<unsigned int, unsigned int, std::hash<u\
nsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::spinlock> > > (new_allocator.h:140)
==3700==    by 0x46CBCE: destroy<std::vector<libcuckoo::cuckoohash_map<unsigned int, unsigned int, std::hash<unsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::spinlock, std::allocator<libcuckoo::cuckoohash_map<unsigned int, unsigned int, std::hash<u\
nsigned int>, std::equal_to<unsigned int>, std::allocator<std::pair<unsigned int const, unsigned int> > >::spinlock> > > (alloc_traits.h:487)
==3700==    by 0x46CBCE: _M_clear (list.tcc:76)
==3700==    by 0x46CBCE: ~_List_base (stl_list.h:442)
==3700==    by 0x46CBCE: ~list (stl_list.h:733)
==3700==    by 0x46CBCE: ~cuckoohash_map (cuckoohash_map.hh:49)
==3700==    by 0x46CBCE: run_with_valgrind::CuckoomapDestroyTest_destroy_01_Test::TestBody() (CuckoomapDestroyTest.cpp:19)
==3700==    by 0x573629: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest.cc:2433)
==3700==    by 0x573629: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2469)
==3700==    by 0x5640FC: testing::Test::Run() (gtest.cc:2508)
==3700==    by 0x564267: testing::TestInfo::Run() (gtest.cc:2684)
==3700==    by 0x564488: testing::TestSuite::Run() (gtest.cc:2816)
==3700==    by 0x56A0F8: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5338)
==3700==    by 0x56A2D9: HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (gtest.cc:2433)
==3700==    by 0x56A2D9: HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (gtest.cc:2469)
==3700==    by 0x56A2D9: testing::UnitTest::Run() (gtest.cc:4925)
==3700==    by 0x4185A0: RUN_ALL_TESTS (gtest.h:2473)
==3700==    by 0x4185A0: main (TestMain.cpp:5)
==3700==  Address 0x8c00000 is in a rw- anonymous segment
==3700==
*/
thompsonbry commented 2 years ago

@manugoyal Are you able to reproduce this from the source code above?

rob-p commented 2 years ago

Hi @thompsonbry,

I tried moving the reproducible example into just a main function rather than as part of a Google Test suite, and valgrind returns cleanly. Are you certain the issue is not from GTest doing something "clever"?

thompsonbry commented 2 years ago

I am not clear how this problem is surfacing. We have (lots of) other tests which are clean under valgrind. I will try to reproduce outside of our normal build environment.

Bryan

On Tue, Jun 21, 2022 at 1:47 PM Rob Patro @.***> wrote:

Hi @thompsonbry,

I tried moving the reproducible example into just a main function rather than as part of a Google Test suite, and valgrind returns cleanly. Are you certain the issue is not from GTest doing something "clever"?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

serkanturgut commented 2 years ago

For posterity;

This turned out to be a Valgrind bug due to aligned_alloc is not replaced by Valgrind https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209886 It was fixed after Valgrind v3.18. aligned_alloc was used becuse spinlock class was set to be placed in a 64-bytes aligned memory location which is used by all_locks_t. We updated our Valgrind version and the problem gone away now. Please feel free to close this issue.

manugoyal commented 2 years ago

Appreciate the followup @serkanturgut! Closing the issue.