Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

llvm-mca crashes with LDMXCSR and STMXCSR instructions in combination for some architectures. #46993

Closed Quuxplusone closed 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR48024
Status RESOLVED FIXED
Importance P normal
Reported by Tom Hender (ToHe_EMA@gmx.de)
Reported on 2020-10-30 12:14:00 -0700
Last modified on 2020-11-20 22:15:05 -0800
Version trunk
Hardware All All
CC andrea.dibiagio@gmail.com, htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, matthew.davis@sony.com, tstellar@redhat.com
Fixed by commit(s) rG0e20666db3ac280affe82d31b6c144923704e9c4, rG973b95e0a84
Attachments
Blocks PR47800
Blocked by
See also PR48033
The llvm-mca tool crashes with the following assembly for some architectures:

        stmxcsr dword ptr [rsp - 4]
        mov     eax, -24577
        and     eax, dword ptr [rsp - 4]
        mov     dword ptr [rsp - 8], eax
        ldmxcsr dword ptr [rsp - 8]
        ret

(Here is a Godbolt Compiler Explorer Link for it:
https://gcc.godbolt.org/z/adh57E)

The problem is present for at least Bulldozer, Piledriver, Haswell, Broadwell,
Silvermont and Goldmont architecture.

llvm-mca does not crash if only either the LDMXCSR or the STMXCSR instruction
is present but not both simultanously including the bitwise manipulation. The
problem does not occur either if one of Zen, Zen 2, Penryn, Westmere, Nehalem,
Sandy Bridge, Ivy Bridge, Skylake or Cannonlake is specified as the scheduling
model.

The error message is as follows:

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash
backtrace.
Stack dump:
0.  Program arguments: /llvm-mca --mcpu=bdver2 <source>
 #0 0x0000559cd340b0cc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/llvm-mca+0x6810cc)
 #1 0x0000559cd3408fa4 llvm::sys::RunSignalHandlers() (/llvm-mca+0x67efa4)
 #2 0x0000559cd3409113 SignalHandler(int) (/llvm-mca+0x67f113)
 #3 0x00007fcf53be23c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
 #4 0x0000559cd3317600 llvm::mca::LSUnit::dispatch(llvm::mca::InstRef const&) (/llvm-mca+0x58d600)
 #5 0x0000559cd3320365 llvm::mca::Scheduler::dispatch(llvm::mca::InstRef&) (/llvm-mca+0x596365)
 #6 0x0000559cd332cc6e llvm::mca::ExecuteStage::execute(llvm::mca::InstRef&) (/llvm-mca+0x5a2c6e)
 #7 0x0000559cd332a71a llvm::mca::DispatchStage::dispatch(llvm::mca::InstRef) (/llvm-mca+0x5a071a)
 #8 0x0000559cd332a769 llvm::mca::DispatchStage::execute(llvm::mca::InstRef&) (.localalias.70) (/llvm-mca+0x5a0769)
 #9 0x0000559cd332b976 llvm::mca::EntryStage::execute(llvm::mca::InstRef&) (.localalias.72) (/llvm-mca+0x5a1976)
#10 0x0000559cd332988c llvm::mca::Pipeline::runCycle() (/llvm-mca+0x59f88c)
#11 0x0000559cd3329a26 llvm::mca::Pipeline::run() (/llvm-mca+0x59fa26)
#12 0x0000559cd3047871 runPipeline(llvm::mca::Pipeline&) (/llvm-mca+0x2bd871)
#13 0x0000559cd303aa72 main (/llvm-mca+0x2b0a72)
#14 0x00007fcf5369b0b3 __libc_start_main (/lib/x86_64-linux-
gnu/libc.so.6+0x270b3)
#15 0x0000559cd304413a _start (/llvm-mca+0x2ba13a)
Quuxplusone commented 3 years ago
Hi Tom,

Apologies for this issue.

I have a fix for it which I plan to commit it tomorrow.

This issue is likely to be a regression introduced by my rewrite of the LSUnit
in commit 5578ec32f9c4fef46adce52a2e3d22bf409b3d2c.

=====

As a side not (unrelated to this bug):

The code generated by LLVM for your code snippet is sub-optimal. It is as if
the compiler tried very hard to keep alive the stack slot containing the old
value of MXCSR (i.e. slot at location `rsp - 4`) until the end of the function.

This is suboptimal because it means that an extra stack slot (rsp - 8) has to
be used for the new value of MXCSR. If instead the original slot was reused,
the compiler could have simply emitted a MR variant of AND (read-modify-write),
and this would have avoided the use of an extra MOV.
GCC gets this right: the entire sequence is three instructions plus the RET.

I wonder if this poor codegen has to do with the fact that STMXCSR is defined
as having "unmodeled side-effects". It might be that somehow that prevents the
compiler from commuting the original ADD and use a RMW variant instead.
Alternatively StackSlotColoring is not doing a good job at merging the two
stack slots. This is just me speculating on what the issue might be in the code
generator.

On the plus side, the compiler is smart at taking advantage of the red-zone in
this case. Part of me wasn't expecting to see negative offsets used with RSP.
In this particular case, it makes perfectly sense and it avoids having to emit
an extra SUB (of RSP) at the beginning, plus an extra ADD at the end.
Quuxplusone commented 3 years ago

@andreadb Please can you raise [Comment #1] as a separate x86 bug?

Quuxplusone commented 3 years ago

Fixed on master by commit 0e20666db3ac280affe82d31b6c144923704e9c4

@Tom, could you please verify that the fix works for you?

Thanks.

@Simon:I am going to raise a separate bug for that poor-codegen.

Quuxplusone commented 3 years ago

Raised bug 48033 for the poor codegen issue.

Quuxplusone commented 3 years ago
(In reply to Andrea Di Biagio from comment #3)
> Fixed on master by commit 0e20666db3ac280affe82d31b6c144923704e9c4
>
> @Tom, could you please verify that the fix works for you?

This should be merged into 11.x - assuming Tom confirms the fix.
Quuxplusone commented 3 years ago

Thank you.

The crash is now fixed for me as well.

The inefficient code generation from Clang luckily doesn't affect my real code because it both must remember the old MXCSR and because it's written with inline assembly anyway. I don't trust compilers around rounding mode changes anymore.

Quuxplusone commented 3 years ago

Hi Andrea,

What is your opinion on backporting this?

https://reviews.llvm.org/rG0e20666db3ac280affe82d31b6c144923704e9c4

Quuxplusone commented 3 years ago
(In reply to Tom Stellard from comment #7)
> Hi Andrea,
>
> What is your opinion on backporting this?
>
> https://reviews.llvm.org/rG0e20666db3ac280affe82d31b6c144923704e9c4

Hi Tom,

My opinion is that it should be merged into the release branch as it prevents a
crash when the input assembly contains store instructions that also have
unmodeled side effects.
Quuxplusone commented 3 years ago

Merged: 973b95e0a84