128-bit __atomic* intrinsics not implemented for x86_64

Quuxplusone commented 10 years ago


Bugzilla Link	PR19149
Status	NEW
Importance	P normal
Reported by	Andrey Semashev (andysem@mail.ru)
Reported on	2014-03-15 06:10:47 -0700
Last modified on	2015-10-14 10:08:52 -0700
Version	3.4
Hardware	PC MacOS X
CC	gary@intrepid.com, jim.king@simplivity.com, llvm-bugs@lists.llvm.org, nenad@intrepid.com, tim@klingt.org
Fixed by commit(s)
Attachments	`atomic_load.c` (643 bytes, application/octet-stream)
Blocks
Blocked by
See also

Currently, Boost.Atomic tests are failing for Darwin on x86_64 with linking
failures such as:

    "/Volumes/TwoTB/LLVM/bin/bin/clang++"   -o "/Volumes/TwoTB/boost/regression/develop/results/boost/bin.v2/libs/atomic/test/native_api.test/clang-darwin-asan11/debug/threading-multi/native_api" "/Volumes/TwoTB/boost/regression/develop/results/boost/bin.v2/libs/atomic/test/native_api.test/clang-darwin-asan11/debug/threading-multi/native_api.o" "/Volumes/TwoTB/boost/regression/develop/results/boost/bin.v2/libs/chrono/build/clang-darwin-asan11/debug/threading-multi/libboost_chrono.dylib" "/Volumes/TwoTB/boost/regression/develop/results/boost/bin.v2/libs/thread/build/clang-darwin-asan11/debug/threading-multi/libboost_thread.dylib" "/Volumes/TwoTB/boost/regression/develop/results/boost/bin.v2/libs/atomic/build/clang-darwin-asan11/debug/threading-multi/libboost_atomic.dylib" "/Volumes/TwoTB/boost/regression/develop/results/boost/bin.v2/libs/system/build/clang-darwin-asan11/debug/threading-multi/libboost_system.dylib"    -g -stdlib=libc++ -fsanitize=address

Undefined symbols for architecture x86_64:
  "___atomic_compare_exchange", referenced from:
      boost::atomics::detail::base_atomic<test_struct<unsigned __int128>, void, 16u, false>::compare_exchange_weak(test_struct<unsigned __int128>&, test_struct<unsigned __int128> const&, boost::memory_order, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<test_struct<unsigned __int128>, void, 16u, false>::compare_exchange_strong(test_struct<unsigned __int128>&, test_struct<unsigned __int128> const&, boost::memory_order, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::compare_exchange_strong(unsigned __int128&, unsigned __int128, boost::memory_order, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::compare_exchange_weak(unsigned __int128&, unsigned __int128, boost::memory_order, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::compare_exchange_strong(__int128&, __int128, boost::memory_order, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::compare_exchange_weak(__int128&, __int128, boost::memory_order, boost::memory_order) volatile in native_api.o
  "___atomic_exchange", referenced from:
      boost::atomics::detail::base_atomic<test_struct<unsigned __int128>, void, 16u, false>::exchange(test_struct<unsigned __int128> const&, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::exchange(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::exchange(__int128, boost::memory_order) volatile in native_api.o
  "___atomic_fetch_add_16", referenced from:
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::fetch_add(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::fetch_add(__int128, boost::memory_order) volatile in native_api.o
  "___atomic_fetch_and_16", referenced from:
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::fetch_and(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::fetch_and(__int128, boost::memory_order) volatile in native_api.o
  "___atomic_fetch_or_16", referenced from:
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::fetch_or(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::fetch_or(__int128, boost::memory_order) volatile in native_api.o
  "___atomic_fetch_sub_16", referenced from:
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::fetch_sub(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::fetch_sub(__int128, boost::memory_order) volatile in native_api.o
  "___atomic_fetch_xor_16", referenced from:
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::fetch_xor(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::fetch_xor(__int128, boost::memory_order) volatile in native_api.o
  "___atomic_is_lock_free", referenced from:
      boost::atomics::detail::base_atomic<test_struct<unsigned __int128>, void, 16u, false>::is_lock_free() const volatile in native_api.o
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::is_lock_free() const volatile in native_api.o
  "___atomic_load", referenced from:
      boost::atomics::detail::base_atomic<test_struct<unsigned __int128>, void, 16u, false>::load(boost::memory_order) const volatile in native_api.o
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::load(boost::memory_order) const volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::load(boost::memory_order) const volatile in native_api.o
  "___atomic_store", referenced from:
      boost::atomics::detail::base_atomic<test_struct<unsigned __int128>, void, 16u, false>::store(test_struct<unsigned __int128> const&, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<unsigned __int128, int, 16u, false>::store(unsigned __int128, boost::memory_order) volatile in native_api.o
      boost::atomics::detail::base_atomic<__int128, int, 16u, true>::store(__int128, boost::memory_order) volatile in native_api.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

You can see the actual test results on this page:

http://www.boost.org/development/tests/develop/developer/atomic.html

See marshall-mac tester, the failing tests are lockfree and native_api.

Boost.Atomic detects presence of 128-bit atomics by checking for the
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 macro, which is defined to 1 for the target
architecture. According to GCC docs here:

http://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

say: "Target architectures are encouraged to provide their own patterns for
each of these built-in functions. If no target is provided, the original non-
memory model set of ‘__sync’ atomic built-in functions are utilized, along with
any required synchronization fences surrounding it in order to achieve the
proper behavior."

__atomic* intrinsics, if not implemented for a particular target architecture,
should fall back to the __sync* intrinsics, so presence of the
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 macro should guarantee that the __atomic*
intrinsics are also supported, one way or another.

Quuxplusone commented 10 years ago

Note: we had to add a workaround for this problem in Boost.Atomic. The last version without the workaround was revision e4bde20f2eec0a51be14533871d2123bd2ab9cf3 of Boost.Atomic.

Alternatively, you can comment "#define BOOST_ATOMIC_X86_NO_GCC_128_BIT_ATOMIC_INTRINSICS" line in boost/atomic/detail/gcc-atomic.hpp to reproduce the problem.

Quuxplusone commented 10 years ago

I see similar problems on linux targets, both 64 and 32 bits.  It boils down to
two issues:

(1) 64 bits targets do not generate calls to "..._16" (e.g. "__atomic_load_16")
routines for 16 bytes atomic functions.  Instead, calls to generic functions
are being generated (e.g __atomic_load() where number of bytes is the first
argument).

In Clang 3.4 (and 3.5) I see these lines of code that are responsible:

CGAtomic.cpp:
  // Use a library call.  See: http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .
  if (UseLibcall) {
    bool UseOptimizedLibcall = false;
    switch (E->getOp()) {
    case AtomicExpr::AO__c11_atomic_fetch_add:
    case AtomicExpr::AO__atomic_fetch_add:
    case AtomicExpr::AO__c11_atomic_fetch_and:
    case AtomicExpr::AO__atomic_fetch_and:
    case AtomicExpr::AO__c11_atomic_fetch_or:
    case AtomicExpr::AO__atomic_fetch_or:
    case AtomicExpr::AO__c11_atomic_fetch_sub:
    case AtomicExpr::AO__atomic_fetch_sub:
    case AtomicExpr::AO__c11_atomic_fetch_xor:
    case AtomicExpr::AO__atomic_fetch_xor:
      // For these, only library calls for certain sizes exist.
      UseOptimizedLibcall = true;
      break;
    default:
      // Only use optimized library calls for sizes for which they exist.
      if (Size == 1 || Size == 2 || Size == 4 || Size == 8)
        UseOptimizedLibcall = true;
      break;
    }

Above, optimized calls are used for up to 8 bytes only. The GNU reference does
suggest that "..._16" calls are also optimized.

As suggested by some other bugs, -mcx16 switch does not work. GNU GCC generates
calls to _16 functions.

(2) On 32 bits targets, calls to "..._8" functions are being made for all
__atomic functions, even though the processor has support for CMPXCHG8B.  GNU
GCC does not call the library and correct code was produced. I looked at the
code:

 351 bool UseLibcall = (Size != Align ||
 352       getContext().toBits(sizeChars) > MaxInlineWidthInBits);

And it seems that "Align" variable is set to 4.  I used aligned attribute to
align variables to 8 (and 16 bytes), but nothing works.  The Align variable was
set by this line:

  CharUnits alignChars = getContext().getTypeAlignInChars(AtomicTy);
  unsigned Align = alignChars.getQuantity();

And there is no library to satisfy these references.

Kind of broken.  All the code that works with GCC does not work wirh Clang for
the reasons above.

Quuxplusone commented 10 years ago

Furthermore, on 32 bits target, generic atomic calls (e.g void __atomic_load
(type *ptr, type *ret, int memmodel)) do not work for 64 bits types (e.g long
long) while built-in functions (e.g. type __atomic_load_n (type *ptr, int
memmodel)) seems to work correctly.  Note that both of the calls generate code
that calls __atomic_load_8, but in a case of generic calls there is a mismatch
in arguments for __atomic_load_8.

I am attaching a test case to demonstrate the problem.

If I look at the IR for the bad case:

define i64 @load(i32 %x) #0 {
  %1 = alloca i64, align 4
  %2 = alloca i32, align 4
  %tmp = alloca i64, align 8
  store i32 %x, i32* %2, align 4
  %3 = call i64 bitcast (i64 (i64*, i32)* @__atomic_load_8 to i64 (i8*, i32)*)(i8* bitcast (i64* @i to i8*), i32 5)
  %4 = load i32* %2, align 4
  %5 = icmp eq i32 %4, 1
  br i1 %5, label %6, label %8

; <label>:6                                       ; preds = %0
  %7 = load i64* %tmp, align 8
  store i64 %7, i64* %1
  br label %9

; <label>:8                                       ; preds = %0
  store i64 0, i64* %1
  br label %9

; <label>:9                                       ; preds = %8, %6
  %10 = load i64* %1
  ret i64 %10
}

I don't see the code that uses the result of the __atomic_load_8.

Quuxplusone commented 10 years ago

Attached atomic_load.c (643 bytes, application/octet-stream): Demostrate __atomic problem with 64bits types on 32 bit machine

Quuxplusone commented 8 years ago

It appears in the clang development tip (revision svn249370) that using __sync_bool_compare_and_swap is now emitting cmpxchg16. I think it's possible this defect has been resolved but the ticket hasn't been updated yet?

Quuxplusone / LLVMBugzillaTest

128-bit __atomic* intrinsics not implemented for x86_64 #19148