Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

LLVM lowers __atomic builtin to a function call when lowering __sync builtin to a single instruction with LOCK prefix #37819

Open Quuxplusone opened 6 years ago

Quuxplusone commented 6 years ago
Bugzilla Link PR38846
Status NEW
Importance P enhancement
Reported by Taewook Oh (twoh@fb.com)
Reported on 2018-09-05 09:15:37 -0700
Last modified on 2021-07-28 06:43:57 -0700
Version trunk
Hardware PC Linux
CC llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk, sami.liedes@iki.fi
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
=== Test program ===

#include <cstdint>

template <typename T>
struct __attribute__((__packed__)) Foo {
    T a;
    void incAtomic() {
        __atomic_add_fetch(&a, 1, __ATOMIC_RELEASE);
    }
    void incSync() {

        __sync_add_and_fetch(&a, 1);
    }
};

int main(int argc, char** argv) {
    Foo<uint16_t> f;
    f.a = argc;
    f.incAtomic();
    f.incSync();
    return f.a;
}

=== Disassembled instructions ===
(compiled with clang++ --std=c++17 -O3 -Wall -DNDEBUG=1)

main:
        push    rax
        mov     word ptr [rsp], di
        mov     rdi, rsp
        mov     esi, 1
        mov     edx, 3
        call    __atomic_fetch_add_2
        lock            add     word ptr [rsp], 1 # __sync_add_and_fetch
        movzx   eax, word ptr [rsp]
        pop     rcx
        ret

GCC's documentation for builtins (https://gcc.gnu.org/onlinedocs/gcc-
4.7.2/gcc/_005f_005fatomic-Builtins.html) says that __atomic builtsins are
intended to replace __sync builtins, so there should be no semantic difference
between corresponding ones.

I believe the difference comes from the alignment checking logic
(https://github.com/llvm-
mirror/clang/blob/master/lib/CodeGen/CGAtomic.cpp#L768) only included in
handling __atomic builtins. However, considering that LOCK prefix works for
unaligned accesses as well, I'm not sure if that alignment check is necessary.
LOCK for unaligned memory access is expensive, but not sure if it is more
expensive than library calls.
Quuxplusone commented 6 years ago
I think your example has undefined behavior in both the __sync and __atomic
forms. You're calling the versions of these builtins that take an unsigned
short*, and passing a pointer that is not suitably aligned. This is UB, just as
it would be if you called a function that performed a non-atomic load on such a
pointer.

However, I think that's largely irrelevant for the purpose of this bug, and we
usually do try to guarantee that cases like this work, when the alignment can
be determined from the form of the expression. (I think the cause of the
difference between the __sync_* builtins and the __atomic_* builtins is that
the former make no attempt to make such cases work, whereas the latter do try
to make this work.)

The same behavior can be seen for cases whose behavior is supposed to be
defined, such as:

struct A {
  char c[4];
} *p, *q;
void f() { __atomic_load(p, q, 5); } // calls __atomic_load_4

struct B {
  int n;
} *r, *s;
void g() { __atomic_load(r, s, 5); } // emitted as a load

... so it looks like we use a libcall for unaligned accesses for __atomic_*,
and do not try to support unaligned accesses for __sync_*. (Note that the
__sync_* functions only take built-in integral and pointer types, for which
alignment is implied, whereas the __atomic_* functions take more arbitrary
types.)

Now, it's wrong to call __atomic_load_4 in the above example: the "optimized"
library functions are only valid to call on properly-aligned pointers. We
should be calling the generic __atomic_load library function instead.

Also, it's worth noting that choosing to use inlined code rather than a libcall
is an ABI decision, not merely a performance choice: if some part of the
program uses inlined code, the libcall implementation must not perform
nonatomic accesses under a lock. It's always correct to conservatively use a
libcall, but it's not correct to inline code if the libatomic implementation
would use a lock for that object.

I would note that GCC generates a call to __atomic_load_16 for a 1-byte-aligned
16 byte load -- https://godbolt.org/z/ScqQB9 -- so we're not the only ones to
get this wrong. Also, GCC's libatomic implementation of __atomic_load will use
a lock to load an object of type 'struct A' above if it straddles a 16-byte-
alignment boundary, so GCC + GNU libatomic gets the f() case wrong too by
emitting a 'mov' rather than a libcall.

One final problem is that the compiler-rt implementation of __atomic_load does
not even depend on the alignment of the pointer: when given a size for which a
naturally-aligned load would be atomic, it just calls __c11_atomic_load, which
will in general *not* be atomic when given an unaligned pointer. (GCC's
libatomic gets this right, as far as I can see.)

I would conclude:

 * in principle and in practice, __sync_* atomics do not support unaligned pointers
 * in principle, __atomic_* atomics appear to be intended to support unaligned pointers; in practice, such support does not appear to work

(and the above seems to hold across both Clang and GCC).

The best advice I can give you is to avoid misaligned atomic operations.
Quuxplusone commented 6 years ago

Thank you Richard for your detailed reply! Is unaligned atomic operation UB only for these builtins, or for std::atomic as well? Here https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and seems that the generated code is just same as the one from __sync (in therms of atomic operation).

Quuxplusone commented 6 years ago
(In reply to Taewook Oh from comment #2)
> Thank you Richard for your detailed reply! Is unaligned atomic operation UB
> only for these builtins, or for std::atomic as well? Here
> https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and
> seems that the generated code is just same as the one from __sync (in therms
> of atomic operation).

Generally, when you pass a T* to a function, that function is free to assume
that the pointer value is suitably aligned for type T. As a result, you must be
extremely careful when using pointers that point to members of a struct
declared with __attribute__((packed)).

If you're using atomic operations on a packed struct as a whole, that should
work fine. Both _Atomic(T) and std::atomic<T> will increase the alignment of
the atomic type as necessary to make natural atomic operations work on the
object, though, so atomic<packed_struct> will typically have an alignment
greater than 1.

There's implementation divergence in your example: GCC ignores the
__attribute__((packed)) on the member of type atomic<uint16_t> in Foo2 (and
gives you a working program that performs a correct, naturally-aligned atomic
access). Clang respects the request, which means you're making a member
function call with a misaligned this pointer, which is UB.

You might also be interested in this greatly reduced example of what happens if
you try to use __atomic ops on pointers that are not naturally aligned:
https://gcc.godbolt.org/z/LqXKrt

Note that GCC generates a plain 'mov' for this 1-byte-aligned 4 byte load.
That's not atomic in general. Clang generates a call to __atomic_load_4, which
in both compiler-rt and libatomic will result in a plain 'mov'.
Quuxplusone commented 6 years ago

I see. Thank you for the explanation! Could you please point me where in the specification the alignment requirement for atomics is described? I tried to find it by myself but couldn't find one.

Quuxplusone commented 6 years ago
(In reply to Taewook Oh from comment #4)
> I see. Thank you for the explanation! Could you please point me where in the
> specification the alignment requirement for atomics is described?

For which form of atomics? (__sync_* or __atomic_* or _Atomic or std::atomic?)
Quuxplusone commented 6 years ago

I meant std::atomic. (I guess there's no "standard" for builtins?)

Quuxplusone commented 6 years ago
(In reply to Taewook Oh from comment #6)
> I meant std::atomic. (I guess there's no "standard" for builtins?)

There is no alignment requirement in any specification for std::atomic; rather,
the C++ standard requires it to work for an arbitrary trivially-copyable type,
and implementations make that efficient by increasing the alignment where
necessary.

In libc++ this is done by implementing it in terms of _Atomic (where
available); Clang's implementation of _Atomic increases the alignment as
necessary. (The alignment of _Atomic(T) *should* be specified by the psABI for
the target, but in practice it is not.) However, when compiled with GCC, it
just uses __atomic_* and doesn't increase the alignment, presumably resulting
in miscompiles in the cases where GCC's __atomic_* functions fail to actually
be atomic.

In libstdc++, the GCC bug gcc.gnu.org/PR87237 is worked around by increasing
the alignment on std::atomic<T> when sizeof(T) is a power of 2 <= 16 and
alignof(T) is less than sizeof(T):

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/atomic#L189
Quuxplusone commented 6 years ago

Got it. Thank you for the detailed explanation!

Quuxplusone commented 6 years ago

PR38871 filed for the bug that libc++'s std::atomic is not always atomic when compiling with GCC. (It's a GCC bug but libc++ needs to work around it.)

Quuxplusone commented 6 years ago

As of r341734, clang no longer calls the __builtinfoo lib functions for misaligned pointers.

Quuxplusone commented 3 years ago

_Bug 25335 has been marked as a duplicate of this bug._