Created attachment 24907
Repro (also shared in Godbolt)
A simple repro is at https://godbolt.org/z/z8ajGhvcx and attached as well.
Compile with "-std=c++2a -g0 -O3 -stdlib=libc++" using "armv8-a clang (trunk)"
and found to repro with clang 11 as well.
The essence is this, using fmtlib (version 7.1.3):
void check(int a)
{
fmt::print("{}\n", a);
}
when writing:
int main()
{
check(123);
check(321);
}
compiled code executes correctly and prints 123, 321. In optimized builds, it
is noted that the compiler inlines fmt::print and deduplicates some common code
for the two calls to check.
When writing:
task<int> f(std::vector<int> v)
{
#if REPRO
for (auto& x : v)
#endif
{
check(123);
co_await std::experimental::suspend_always();
check(321);
}
co_return 42;
}
the compiled code has an issue, and the executing code prints 123, garbage.
Looking at the assembly, it seems that the compiler spills "sp" into the
coroutine frame as part of the first call to check:
mov w10, #1
mov x11, sp ; sp is here
stp x10, x11, [x19, #32] ; gets stored here (A)
cmp x9, x8
stp x10, x11, [x19, #48] ; gets stored here (B)
b.eq .LBB3_7
ldr x8, [x19, #64]
.LBB3_5:
ldp x2, x3, [x19, #32] ; gets loaded here (A) into x3
str x8, [x19, #80]
mov w8, #123
stp x8, xzr, [sp]
adrp x0, .L.str
add x0, x0, :lo12:.L.str
mov w1, #3
bl fmt::v7::vprint(fmt::v7::basic_string_view<char>, fmt::v7::format_args)
For the second call site of check, after the suspension point, the compiler
emits:
ldp x2, x3, [x19, #48] ; gets loaded here (B) into x3
mov w8, #321
stp x8, xzr, [sp]
adrp x0, .L.str
add x0, x0, :lo12:.L.str
mov w1, #3
bl fmt::v7::vprint(fmt::v7::basic_string_view<char>, fmt::v7::format_args)
where it is loading the former value of "sp" which gets consumed by vprint,
resulting in garbage.
repro_aarch64_coro_opt.cpp
(4025 bytes, text/plain)