Open Quuxplusone opened 10 years ago
Note: we had to add a workaround for this problem in Boost.Atomic. The last version without the workaround was revision e4bde20f2eec0a51be14533871d2123bd2ab9cf3 of Boost.Atomic.
Alternatively, you can comment "#define BOOST_ATOMIC_X86_NO_GCC_128_BIT_ATOMIC_INTRINSICS" line in boost/atomic/detail/gcc-atomic.hpp to reproduce the problem.
I see similar problems on linux targets, both 64 and 32 bits. It boils down to
two issues:
(1) 64 bits targets do not generate calls to "..._16" (e.g. "__atomic_load_16")
routines for 16 bytes atomic functions. Instead, calls to generic functions
are being generated (e.g __atomic_load() where number of bytes is the first
argument).
In Clang 3.4 (and 3.5) I see these lines of code that are responsible:
CGAtomic.cpp:
// Use a library call. See: http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .
if (UseLibcall) {
bool UseOptimizedLibcall = false;
switch (E->getOp()) {
case AtomicExpr::AO__c11_atomic_fetch_add:
case AtomicExpr::AO__atomic_fetch_add:
case AtomicExpr::AO__c11_atomic_fetch_and:
case AtomicExpr::AO__atomic_fetch_and:
case AtomicExpr::AO__c11_atomic_fetch_or:
case AtomicExpr::AO__atomic_fetch_or:
case AtomicExpr::AO__c11_atomic_fetch_sub:
case AtomicExpr::AO__atomic_fetch_sub:
case AtomicExpr::AO__c11_atomic_fetch_xor:
case AtomicExpr::AO__atomic_fetch_xor:
// For these, only library calls for certain sizes exist.
UseOptimizedLibcall = true;
break;
default:
// Only use optimized library calls for sizes for which they exist.
if (Size == 1 || Size == 2 || Size == 4 || Size == 8)
UseOptimizedLibcall = true;
break;
}
Above, optimized calls are used for up to 8 bytes only. The GNU reference does
suggest that "..._16" calls are also optimized.
As suggested by some other bugs, -mcx16 switch does not work. GNU GCC generates
calls to _16 functions.
(2) On 32 bits targets, calls to "..._8" functions are being made for all
__atomic functions, even though the processor has support for CMPXCHG8B. GNU
GCC does not call the library and correct code was produced. I looked at the
code:
351 bool UseLibcall = (Size != Align ||
352 getContext().toBits(sizeChars) > MaxInlineWidthInBits);
And it seems that "Align" variable is set to 4. I used aligned attribute to
align variables to 8 (and 16 bytes), but nothing works. The Align variable was
set by this line:
CharUnits alignChars = getContext().getTypeAlignInChars(AtomicTy);
unsigned Align = alignChars.getQuantity();
And there is no library to satisfy these references.
Kind of broken. All the code that works with GCC does not work wirh Clang for
the reasons above.
Furthermore, on 32 bits target, generic atomic calls (e.g void __atomic_load
(type *ptr, type *ret, int memmodel)) do not work for 64 bits types (e.g long
long) while built-in functions (e.g. type __atomic_load_n (type *ptr, int
memmodel)) seems to work correctly. Note that both of the calls generate code
that calls __atomic_load_8, but in a case of generic calls there is a mismatch
in arguments for __atomic_load_8.
I am attaching a test case to demonstrate the problem.
If I look at the IR for the bad case:
define i64 @load(i32 %x) #0 {
%1 = alloca i64, align 4
%2 = alloca i32, align 4
%tmp = alloca i64, align 8
store i32 %x, i32* %2, align 4
%3 = call i64 bitcast (i64 (i64*, i32)* @__atomic_load_8 to i64 (i8*, i32)*)(i8* bitcast (i64* @i to i8*), i32 5)
%4 = load i32* %2, align 4
%5 = icmp eq i32 %4, 1
br i1 %5, label %6, label %8
; <label>:6 ; preds = %0
%7 = load i64* %tmp, align 8
store i64 %7, i64* %1
br label %9
; <label>:8 ; preds = %0
store i64 0, i64* %1
br label %9
; <label>:9 ; preds = %8, %6
%10 = load i64* %1
ret i64 %10
}
I don't see the code that uses the result of the __atomic_load_8.
Attached atomic_load.c
(643 bytes, application/octet-stream): Demostrate __atomic problem with 64bits types on 32 bit machine
It appears in the clang development tip (revision svn249370) that using __sync_bool_compare_and_swap is now emitting cmpxchg16. I think it's possible this defect has been resolved but the ticket hasn't been updated yet?
atomic_load.c
(643 bytes, application/octet-stream)