Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[coroutines] Incorrectly saving sp across suspension points in optimized builds on aarch64 #49543

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50574
Status NEW
Importance P enhancement
Reported by Bart De Smet (bartde@microsoft.com)
Reported on 2021-06-03 10:26:32 -0700
Last modified on 2021-06-03 10:26:32 -0700
Version trunk
Hardware PC All
CC blitzrakete@gmail.com, erik.pilkington@gmail.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments repro_aarch64_coro_opt.cpp (4025 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 24907
Repro (also shared in Godbolt)

A simple repro is at https://godbolt.org/z/z8ajGhvcx and attached as well.
Compile with "-std=c++2a -g0 -O3 -stdlib=libc++" using "armv8-a clang (trunk)"
and found to repro with clang 11 as well.

The essence is this, using fmtlib (version 7.1.3):

void check(int a)
{
  fmt::print("{}\n", a);
}

when writing:

int main()
{
    check(123);
    check(321);
}

compiled code executes correctly and prints 123, 321. In optimized builds, it
is noted that the compiler inlines fmt::print and deduplicates some common code
for the two calls to check.

When writing:

task<int> f(std::vector<int> v)
{
#if REPRO
    for (auto& x : v)
#endif
    {
        check(123);

        co_await std::experimental::suspend_always();

        check(321);
    }
    co_return 42;
}

the compiled code has an issue, and the executing code prints 123, garbage.
Looking at the assembly, it seems that the compiler spills "sp" into the
coroutine frame as part of the first call to check:

  mov w10, #1
  mov x11, sp               ; sp is here
  stp x10, x11, [x19, #32]  ; gets stored here (A)
  cmp x9, x8
  stp x10, x11, [x19, #48]  ; gets stored here (B)
  b.eq .LBB3_7
  ldr x8, [x19, #64]
.LBB3_5:
  ldp x2, x3, [x19, #32]    ; gets loaded here (A) into x3
  str x8, [x19, #80]
  mov w8, #123
  stp x8, xzr, [sp]
  adrp x0, .L.str
  add x0, x0, :lo12:.L.str
  mov w1, #3
  bl fmt::v7::vprint(fmt::v7::basic_string_view<char>, fmt::v7::format_args)

For the second call site of check, after the suspension point, the compiler
emits:

  ldp x2, x3, [x19, #48]    ; gets loaded here (B) into x3
  mov w8, #321
  stp x8, xzr, [sp]
  adrp x0, .L.str
  add x0, x0, :lo12:.L.str
  mov w1, #3
  bl fmt::v7::vprint(fmt::v7::basic_string_view<char>, fmt::v7::format_args)

where it is loading the former value of "sp" which gets consumed by vprint,
resulting in garbage.
Quuxplusone commented 3 years ago

Attached repro_aarch64_coro_opt.cpp (4025 bytes, text/plain): Repro (also shared in Godbolt)