libstdc++ std::thread, when compiled with clang, "pure virtual method called" - Githubissues

Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

libstdc++ std::thread, when compiled with clang, "pure virtual method called" #12836

Open Quuxplusone opened 12 years ago

Quuxplusone commented 12 years ago


Bugzilla Link	PR12730
Status	REOPENED
Importance	P normal
Reported by	Cody Maloney (cmaloney@tagged.com)
Reported on	2012-05-02 16:12:22 -0700
Last modified on	2013-09-30 17:52:22 -0700
Version	trunk
Hardware	PC Linux
CC	ashaduri@yandex.ru, awg@embtoolkit.org, bigcheesegs@gmail.com, chasun@microsoft.com, chip@pobox.com, dgregor@apple.com, geek4civic@gmail.com, ismail@i10z.com, llvm-bugs@lists.llvm.org, llvm@inglorion.net, matias.fontanini@gmail.com, mqudsi@neosmart.net, rafael@espindo.la, rprichard@google.com, sefi@s-e-f-i.de, s_llvm@nedprod.com, tim@klingt.org, weimingz@codeaurora.org, zhouyan1014@gmail.com
Fixed by commit(s)
Attachments	`valgrind` (11438 bytes, text/plain) `backtrace.txt` (1925 bytes, text/plain)
Blocks
Blocked by
See also

$cat thread.cc
#include <thread>

void f(){}

int main()
{
   std::thread t(f);
   t.join();
}

$clang++ -v
clang version 3.2 (trunk)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$clang++ -std=c++11 thread.cc -o thread -pthread

$./thread
pure virtual method called
terminate called without an active exception
Aborted

Quuxplusone commented 12 years ago

I just updated clang to r156503, and it seems to work for me now. Closing.

Quuxplusone commented 12 years ago

Turns out it was just a fluke, and is indeed still broken for me. Not sure how/why it worked...

Quuxplusone commented 12 years ago

Attached valgrind (11438 bytes, text/plain): Valgrind's output

Quuxplusone commented 12 years ago

I've cloned the svn repository and this problem persists. This is my clang
version:

$ clang++ -v
clang version 3.2 (trunk 159725)
Target: x86_64-unknown-linux-gnu
Thread model: posix

It looks like the ref count of the shared_ptr that holds the internal
representation of std::thread gets to 0, so the thread impl is getting
destroyed. I can't figure out why this example does actually work on g++ 4.7,
since both compilers use the same headers/libs. This could be a bug in
libstdc++'s std::thread implementation, if this code didn't work fine on g++.

Quuxplusone commented 11 years ago

same issue here:
clang version 3.2
Target: x86_64-unknown-linux-gnu
Thread model: posix

Quuxplusone commented 11 years ago

Same problem with

clang version 3.3 (trunk 168286)
Target: x86_64-unknown-linux-gnu
Thread model: posix

libstdc++ come with GCC 4.7.2

I suspect this is a problem with some projects lifetime. Some objects'
(possibly from libstdc++'s implementation of std::thread) member function gets
called while it is being destroyed. I had seen similar problem with G++ before.
But that is a long time ago using pthread without C++11. But the error message
is suspiciously similar.

However debugger does little to help. I cannot figure out which member function
has being called.

Quuxplusone commented 11 years ago

Same problem with

clang version 3.2 (branches/release_32 170508)
Target: x86_64-unknown-linux-gnu
Thread model: posix
OS: Fedora 17
gcc version:  4.7.2 20120921 (Red Hat 4.7.2-2)

the backtrace:

#0  0x0000003e51a35935 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003e51a370e8 in __GI_abort () at abort.c:91
#2  0x0000003e5ce60dad in __gnu_cxx::__verbose_terminate_handler () at
../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x0000003e5ce5eea6 in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:40
#4  0x0000003e5ce5eed3 in std::terminate () at ../../../../libstdc++-
v3/libsupc++/eh_terminate.cc:50
#5  0x0000003e5ce5f9ef in __cxxabiv1::__cxa_pure_virtual () at
../../../../libstdc++-v3/libsupc++/pure.cc:50
#6  0x0000003e5ceb29e0 in std::(anonymous
namespace)::execute_native_thread_routine (__p=<optimized out>)
    at ../../../../../libstdc++-v3/src/c++11/thread.cc:73
#7  0x0000003e51e07d14 in start_thread (arg=0x7ffff7fe6700) at
pthread_create.c:309
#8  0x0000003e51af168d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115

62        namespace
63        {
64          extern "C" void*
(gdb) l
65          execute_native_thread_routine(void* __p)
66          {
67            thread::_Impl_base* __t = static_cast<thread::_Impl_base*>(__p);
68            thread::__shared_base_type __local;
69            __local.swap(__t->_M_this_ptr);
70
71            __try
72              {
73                __t->_M_run(); //abort here
74              }
(gdb) l
75            __catch(...)
76              {
77                std::terminate();
78              }
79
80            return 0;
81          }
82        }
83

it seems that __t->_M_run is a pure virtual function, so the process have been
terminated.

Quuxplusone commented 11 years ago

The bug shows up as a pure virtual call, but the real error is use-after-free
caused by apparent miscompilation of shared_ptr<>.

I've done deep repro on this, and what I find is that when creating a new
thread, libstdc++ std:thread creates an impl object containing the virtual
_M_run() function to run the thread's code.  An odd detail is that this impl
object contains a shared_ptr<> to itself; it's a shared_ptr<base> inside the
base type, and the created impl object is always a type derived from this base.
This approach, while weird, ensures that the impl object stays alive until the
new thread starts running and uses it.  Obviously since the impl object
contains the virtual _M_run() that actually runs the thread's code, this is
quite important.

But this goes wrong because Clang destroys the impl in the parent thread,
before the child thread can use it.

The apparently miscompiled code is in <thread>; I quote here from libstdc++
4.7.2:

129     template<typename _Callable, typename... _Args>
130       explicit
131       thread(_Callable&& __f, _Args&&... __args)
132       {
133         _M_start_thread(_M_make_routine(std::__bind_simple(
134                 std::forward<_Callable>(__f),
135                 std::forward<_Args>(__args)...)));
136       }

Clang destroys the impl with a backtrace that starts at
shared_ptr::_M_destroy() ends at line 135 above.  Remember, since the impl
contains a shared_ptr to *itself*, the reference count should never fall below
one until the thread exits, so this destruction is a mistake.

Running under valgrind confirms that there is a use-after-free here.

I don't know why Clang would do this.

Why does this show up as a pure virtual call?  Simple: In the process of
orderly destruction, the impl's vtable is reset to its base class vtable before
the object is finally freed.  That base vtable contains a pure virtual
function, which is subsequently called.  (Even though the memory was freed, the
vtable pointer seems to remain unchanged.)  I suppose this symptom would not
occur if e.g. the base impl class had a pure virtual constructor.

Here's the backtrace, showing premature destruction.  We can be sure that this
destruction is premature because it's supposed to happen in the child thread,
not in the parent.

(gdb) where
#0  std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void
(*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> >
>, (__gnu_cxx::_Lock_policy)1>::_M_destroy() (this=0x804e008)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:418
#1  0x08048f4f in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)1>::_M_release
(this=0x804e008)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:164
#2  0x08048ecb in std::__shared_count<1>::~__shared_count (this=0xbffff464)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:558
#3  0x08048e97 in std::__shared_count<1>::~__shared_count (this=0xbffff464)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:556
#4  0x08048fac in std::__shared_ptr<std::thread::_Impl_base, 1>::~__shared_ptr
(this=0xbffff460)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr_base.h:813
#5  0x08048f87 in std::shared_ptr<std::thread::_Impl_base>::~shared_ptr
(this=0xbffff460)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr.h:93
#6  0x08048e07 in std::shared_ptr<std::thread::_Impl_base>::~shared_ptr
(this=0xbffff460)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/bits/shared_ptr.h:93
#7  0x08048cd8 in std::thread::thread<void (&)(), >(void (&)(void))
(this=0xbffff4c0, __f=@0x8048a60: {void (void)} 0x8048a60 <f()>)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/thread:133
#8  0x08048c14 in std::thread::thread<void (&)(), >(void (&)(void))
(this=0xbffff4c0, __f=@0x8048a60: {void (void)} 0x8048a60 <f()>)
    at /usr/bin/../lib/gcc/i486-linux-gnu/4.7/../../../../include/c++/4.7/thread:135
#9  0x08048a8c in main () at test.cc:9

Quuxplusone commented 11 years ago

Attached backtrace.txt (1925 bytes, text/plain): backtrace of premature destruction

Quuxplusone commented 11 years ago

I think the root problem is that GCC is defining the
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros, whereas Clang is not.

In libstdc++, std::shared_ptr<T> is derived from
std::__shared_ptr<T, __gnu_cxx::__default_lock_policy>.  __default_lock_policy
is a compile-time constant defined in the include/ext/concurrence.h header.  The
value varies depending upon which macros are defined:

  // Compile time constant that indicates prefered locking policy in
  // the current configuration.
  static const _Lock_policy __default_lock_policy =
#ifdef __GTHREADS
#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) \
     && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4))
  _S_atomic;
#else
  _S_mutex;
#endif
#else
  _S_single;
#endif

With G++ on x86 and x86_64, the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros
are defined by the compiler, in gcc/c-family/c-cppbuiltin.c, so
__default_lock_policy is _S_atomic.

With Clang, the macros are not defined, so __default_lock_policy is _S_mutex.

(The built-in macros can be shown by passing -dM -E to gcc/clang.  e.g.: gcc -
dM -E - < /dev/null)

With the thread.c test case, large parts of libstdc++ are inlined and therefore
compiled by Clang.  The std::thread constructor calls
std::thread::_M_make_routine, which creates an std::__shared_ptr<T, _S_mutex>
object.  The constructor passes this object to std::thread::_M_start_thread,
which is an out-of-line function in libstdc++.so.6.
std::thread::_M_start_thread expects an std::__shared_ptr<T, _S_atomic> object.
I think this mismatch is what causes the premature destruction of the
std::thread::_Impl object.

Here are the most relevant bits from the 4.7.2 include/std/thread header:

  class thread
  {
  public:
    ...
    struct _Impl_base;
    typedef shared_ptr<_Impl_base>  __shared_base_type;
    ...
    template<typename _Callable, typename... _Args>
      explicit
      thread(_Callable&& __f, _Args&&... __args)
      {
        _M_start_thread(_M_make_routine(std::__bind_simple(
                std::forward<_Callable>(__f),
                std::forward<_Args>(__args)...)));
      }
    ...
  private:
    void
    _M_start_thread(__shared_base_type);

    template<typename _Callable>
      shared_ptr<_Impl<_Callable>>
      _M_make_routine(_Callable&& __f)
      {
    // Create and allocate full data structure, not base.
    return std::make_shared<_Impl<_Callable>>
            (std::forward<_Callable>(__f));
      }
  };

If I define the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros when I build the
test case with Clang, then the test case passes, and valgrind is happy.

This ABI issue resulted in an (invalid) gcc bug report,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42734.  There seems to be some
acknowledgement there that the macros are expected to change the libstdc++ ABI.

I think the driver needs to define the __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248]
macros for libstdc++ compatibility.  There is an issue open regarding these
macros, but it does not mention ABI compatibility,
http://llvm.org/bugs/show_bug.cgi?id=11174.

Quuxplusone commented 11 years ago

Ryan, you are a gentleman and a scholar. Thanks

Quuxplusone commented 11 years ago

Fixed in r178816.

Quuxplusone commented 11 years ago

@Michael Spencer: Is there a good reason why your fix adding the __GCC_HAVE_SYNC_COMPARE_ANDSWAP[1248] macros in SVN rev 178816 is x86 only?

If you look at http://sourceware.org/ml/libc-ports/2010-11/msg00005.html you'll see that on ARM the use of atomic builtins only happens if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 is set.

This rather implies the need for every architecture target, not just x86, to set __GCC_HAVE_SYNC_COMPARE_ANDSWAP[1248] macros according to whether that target provides CAS ops of varying sizes.

FYI I just got bit on this bug but with an ARM target, hence me finding this report.

Niall

Quuxplusone commented 11 years ago

I posted the patch for ARM to clang-commits maillist.

Quuxplusone commented 11 years ago

please help to review it:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130826/087642.html

Quuxplusone commented 11 years ago

For x86, it looks "__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8" only apply to i586 or
above. So we can't define the 4 macros for all targets.
I don't know if it is safe to define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{1,2,4}
for all targets.

And I can't find a way to compute the macros for each target either

Quuxplusone commented 10 years ago

(In reply to comment #16)
> For x86, it looks "__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8" only apply to i586 or
> above. So we can't define the 4 macros for all targets.
> I don't know if it is safe to define
> __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{1,2,4} for all targets.
>
> And I can't find a way to compute the macros for each target either

ARMv7a has 64 bit atomics, so you can turn on
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 for that and above.

Niall

Quuxplusone commented 10 years ago

Should it be tied to MaxAtomicPromoteWidth ?
However, the value of MaxAtomicPromoteWidth is Target specific, not CPU
specific.

Quuxplusone commented 10 years ago

ARM part: patched in r191707