Open Quuxplusone opened 6 years ago
Bugzilla Link | PR38846 |
Status | NEW |
Importance | P enhancement |
Reported by | Taewook Oh (twoh@fb.com) |
Reported on | 2018-09-05 09:15:37 -0700 |
Last modified on | 2021-07-28 06:43:57 -0700 |
Version | trunk |
Hardware | PC Linux |
CC | llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk, sami.liedes@iki.fi |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
I think your example has undefined behavior in both the __sync and __atomic
forms. You're calling the versions of these builtins that take an unsigned
short*, and passing a pointer that is not suitably aligned. This is UB, just as
it would be if you called a function that performed a non-atomic load on such a
pointer.
However, I think that's largely irrelevant for the purpose of this bug, and we
usually do try to guarantee that cases like this work, when the alignment can
be determined from the form of the expression. (I think the cause of the
difference between the __sync_* builtins and the __atomic_* builtins is that
the former make no attempt to make such cases work, whereas the latter do try
to make this work.)
The same behavior can be seen for cases whose behavior is supposed to be
defined, such as:
struct A {
char c[4];
} *p, *q;
void f() { __atomic_load(p, q, 5); } // calls __atomic_load_4
struct B {
int n;
} *r, *s;
void g() { __atomic_load(r, s, 5); } // emitted as a load
... so it looks like we use a libcall for unaligned accesses for __atomic_*,
and do not try to support unaligned accesses for __sync_*. (Note that the
__sync_* functions only take built-in integral and pointer types, for which
alignment is implied, whereas the __atomic_* functions take more arbitrary
types.)
Now, it's wrong to call __atomic_load_4 in the above example: the "optimized"
library functions are only valid to call on properly-aligned pointers. We
should be calling the generic __atomic_load library function instead.
Also, it's worth noting that choosing to use inlined code rather than a libcall
is an ABI decision, not merely a performance choice: if some part of the
program uses inlined code, the libcall implementation must not perform
nonatomic accesses under a lock. It's always correct to conservatively use a
libcall, but it's not correct to inline code if the libatomic implementation
would use a lock for that object.
I would note that GCC generates a call to __atomic_load_16 for a 1-byte-aligned
16 byte load -- https://godbolt.org/z/ScqQB9 -- so we're not the only ones to
get this wrong. Also, GCC's libatomic implementation of __atomic_load will use
a lock to load an object of type 'struct A' above if it straddles a 16-byte-
alignment boundary, so GCC + GNU libatomic gets the f() case wrong too by
emitting a 'mov' rather than a libcall.
One final problem is that the compiler-rt implementation of __atomic_load does
not even depend on the alignment of the pointer: when given a size for which a
naturally-aligned load would be atomic, it just calls __c11_atomic_load, which
will in general *not* be atomic when given an unaligned pointer. (GCC's
libatomic gets this right, as far as I can see.)
I would conclude:
* in principle and in practice, __sync_* atomics do not support unaligned pointers
* in principle, __atomic_* atomics appear to be intended to support unaligned pointers; in practice, such support does not appear to work
(and the above seems to hold across both Clang and GCC).
The best advice I can give you is to avoid misaligned atomic operations.
Thank you Richard for your detailed reply! Is unaligned atomic operation UB only for these builtins, or for std::atomic as well? Here https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and seems that the generated code is just same as the one from __sync (in therms of atomic operation).
(In reply to Taewook Oh from comment #2)
> Thank you Richard for your detailed reply! Is unaligned atomic operation UB
> only for these builtins, or for std::atomic as well? Here
> https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and
> seems that the generated code is just same as the one from __sync (in therms
> of atomic operation).
Generally, when you pass a T* to a function, that function is free to assume
that the pointer value is suitably aligned for type T. As a result, you must be
extremely careful when using pointers that point to members of a struct
declared with __attribute__((packed)).
If you're using atomic operations on a packed struct as a whole, that should
work fine. Both _Atomic(T) and std::atomic<T> will increase the alignment of
the atomic type as necessary to make natural atomic operations work on the
object, though, so atomic<packed_struct> will typically have an alignment
greater than 1.
There's implementation divergence in your example: GCC ignores the
__attribute__((packed)) on the member of type atomic<uint16_t> in Foo2 (and
gives you a working program that performs a correct, naturally-aligned atomic
access). Clang respects the request, which means you're making a member
function call with a misaligned this pointer, which is UB.
You might also be interested in this greatly reduced example of what happens if
you try to use __atomic ops on pointers that are not naturally aligned:
https://gcc.godbolt.org/z/LqXKrt
Note that GCC generates a plain 'mov' for this 1-byte-aligned 4 byte load.
That's not atomic in general. Clang generates a call to __atomic_load_4, which
in both compiler-rt and libatomic will result in a plain 'mov'.
I see. Thank you for the explanation! Could you please point me where in the specification the alignment requirement for atomics is described? I tried to find it by myself but couldn't find one.
(In reply to Taewook Oh from comment #4)
> I see. Thank you for the explanation! Could you please point me where in the
> specification the alignment requirement for atomics is described?
For which form of atomics? (__sync_* or __atomic_* or _Atomic or std::atomic?)
I meant std::atomic. (I guess there's no "standard" for builtins?)
(In reply to Taewook Oh from comment #6)
> I meant std::atomic. (I guess there's no "standard" for builtins?)
There is no alignment requirement in any specification for std::atomic; rather,
the C++ standard requires it to work for an arbitrary trivially-copyable type,
and implementations make that efficient by increasing the alignment where
necessary.
In libc++ this is done by implementing it in terms of _Atomic (where
available); Clang's implementation of _Atomic increases the alignment as
necessary. (The alignment of _Atomic(T) *should* be specified by the psABI for
the target, but in practice it is not.) However, when compiled with GCC, it
just uses __atomic_* and doesn't increase the alignment, presumably resulting
in miscompiles in the cases where GCC's __atomic_* functions fail to actually
be atomic.
In libstdc++, the GCC bug gcc.gnu.org/PR87237 is worked around by increasing
the alignment on std::atomic<T> when sizeof(T) is a power of 2 <= 16 and
alignof(T) is less than sizeof(T):
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/atomic#L189
Got it. Thank you for the detailed explanation!
PR38871 filed for the bug that libc++'s std::atomic is not always atomic when compiling with GCC. (It's a GCC bug but libc++ needs to work around it.)
As of r341734, clang no longer calls the __builtinfoo
_Bug 25335 has been marked as a duplicate of this bug._